Difference between revisions of "Kernel Function Instrumentation"
(Legacy Content Conversion) |
(fix link to KFI Docs and KFT page) |
||
(2 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
− | [[Image:Alert.gif]] - '''NOTE:''' ''KFI has been renamed to KFT (Kernel Function Trace) starting with the patches for kernel version 2.6.12. If you are using a recent kernel, version 2.6.12 or later, please see the | + | [[Image:Alert.gif]] - '''NOTE:''' ''KFI has been renamed to KFT (Kernel Function Trace) starting with the patches for kernel version 2.6.12. If you are using a recent kernel, version 2.6.12 or later, please see the [[Kernel Function Trace]] page instead of this one.'' |
== Introduction == | == Introduction == | ||
Line 14: | Line 14: | ||
Here's a presentation about KFI usage: | Here's a presentation about KFI usage: | ||
− | * | + | *{{pdf|KFI-presentation.pdf|"Learning the Kernel and Finding Performance Problems with KFI"}} |
*attachment:omap-serial_init.trace.txt - Sample trace used with presentation | *attachment:omap-serial_init.trace.txt - Sample trace used with presentation | ||
− | For prior releases of KFI, see [ | + | For prior releases of KFI, see [[KFI Docs]] |
== Download == | == Download == | ||
Line 127: | Line 127: | ||
Entry Delta PID Function Called At | Entry Delta PID Function Called At | ||
− | + | ||
1 0 0 start_kernel L6+0x0 | 1 0 0 start_kernel L6+0x0 | ||
14 8687 0 setup_arch start_kernel+0x35 | 14 8687 0 setup_arch start_kernel+0x35 | ||
Line 172: | Line 172: | ||
The log is attached here: attachment:kfiboot-9.lst | The log is attached here: attachment:kfiboot-9.lst | ||
+ | |||
A Delta value of 0 usually means the exit from the routine was not seen. | A Delta value of 0 usually means the exit from the routine was not seen. | ||
Line 181: | Line 182: | ||
The top line showing schedule() called 192 times and lasting over 5 seconds, is accounted wrong due to the switch in execution control inside the schedule routine. (The count of 192 calls is correct, but the duration is wrong.) | The top line showing schedule() called 192 times and lasting over 5 seconds, is accounted wrong due to the switch in execution control inside the schedule routine. (The count of 192 calls is correct, but the duration is wrong.) | ||
− | $ ~/work/kfi/kfi/kd -n 30 kfiboot-9.lst | + | <code>$~/work/kfi/kfi/kd -n 30 kfiboot-9.lst |
− | + | {| align=left | |
− | Function | + | !Function!!Count!!Time!!Average!!Local |
− | + | |- | |
− | schedule | + | |schedule||192||5173790||26946||5173790 |
− | do_basic_setup | + | |- |
− | do_initcalls | + | |do_basic_setup||1||1159270||1159270||14 |
− | __delay | + | |- |
− | delay_tsc | + | |do_initcalls||1||1159256||1159256||627 |
− | __const_udelay | + | |- |
− | probe_hwif | + | |__delay||156||619322||3970||0 |
− | do_probe | + | |- |
− | ide_delay_50ms | + | |delay_tsc||156||619322||3970||619322 |
− | isapnp_init | + | |- |
− | isapnp_isolate | + | |__const_udelay||146||608427||4167||0 |
− | ide_init | + | |- |
− | probe_for_hwifs | + | |probe_hwif||8||553972||69246||126 |
− | ide_scan_pcibus | + | |- |
− | init_setup_piix | + | |do_probe||31||553025||17839||68 |
− | ide_scan_pcidev | + | |- |
− | piix_init_one | + | |ide_delay_50ms||103||552588||5364||0 |
− | ide_setup_pci_device | + | |- |
− | probe_hwif_init | + | |isapnp_init||1||383138||383138||18 |
− | time_init | + | |- |
− | get_cmos_time | + | |isapnp_isolate||1||383120||383120||311629 |
− | ide_generic_init | + | |- |
− | ideprobe_init | + | |ide_init||1||339778||339778||22 |
− | wait_for_completion | + | |- |
− | default_idle | + | |probe_for_hwifs||1||339756||339756||103 |
− | io_schedule | + | |- |
− | __wait_on_buffer | + | |ide_scan_pcibus||1||339653||339653||13 |
− | i8042_init | + | |- |
− | i8042_port_register | + | |init_setup_piix||2||339640||169820||0 |
− | __serio_register_port | + | |- |
+ | |ide_scan_pcidev||2||339640||169820||0 | ||
+ | |- | ||
+ | |piix_init_one||2||339640||169820||0 | ||
+ | |- | ||
+ | |ide_setup_pci_device||2||339640||169820||242 | ||
+ | |- | ||
+ | |probe_hwif_init||4||339398||84849||40 | ||
+ | |- | ||
+ | |time_init||1||266911||266911||0 | ||
+ | |- | ||
+ | |get_cmos_time||1||261404||261404||261404 | ||
+ | |- | ||
+ | |ide_generic_init||1||214614||214614||0 | ||
+ | |- | ||
+ | |ideprobe_init||1||214614||214614||0 | ||
+ | |- | ||
+ | |wait_for_completion||6||194573||32428||0 | ||
+ | |- | ||
+ | |default_idle||183||192589||1052||192589 | ||
+ | |- | ||
+ | |io_schedule||18||171313||9517||0 | ||
+ | |- | ||
+ | |__wait_on_buffer||14||150369||10740||141 | ||
+ | |- | ||
+ | |i8042_init||1||137210||137210||295 | ||
+ | |- | ||
+ | |i8042_port_register||2||135318||67659||301 | ||
+ | |- | ||
+ | |__serio_register_port||2||135017||67508||0 | ||
+ | |} | ||
</code> | </code> | ||
− | [[Category:Kernel Function | + | [[Category:Kernel Function Instrumentation]] |
Latest revision as of 17:00, 10 December 2008
- NOTE: KFI has been renamed to KFT (Kernel Function Trace) starting with the patches for kernel version 2.6.12. If you are using a recent kernel, version 2.6.12 or later, please see the Kernel Function Trace page instead of this one.
Contents
Introduction
Kernel Function Instrumentation (KFI) is a kernel function tracing system, which uses the "-finstrument-functions" capability of the gcc compiler to add instrumentation callouts to every function entry and exit. The KFI system provides for capturing these callouts and generating a trace of events, with timing details. KFI is excellent at providing a good timing overview of kernel procedures, allowing you to see where time is spent in functions and sub-routines in the kernel.
The main mode of operation with KFI is to use the system with a dynamic trace configuration. That is, you can set a trace configuration after kernel startup, using the /proc/kfi
interface, and retrieve trace data immediately. However, another (special) mode of operation is available, called STATIC_RUN mode, where the configuration for a KFI run is configured and compiled statically into the kernel. This mode is useful for getting a trace of kernel operation during system bootup (before user space is running).
The KFI configuration lets you specify how to automatically start and stop a trace, whether to include interrupts as part of the trace, and whether to filter the trace data by various criteria (for minimum function duration, only certain listed functions, etc.) KFI trace data is retrieved by reading from /proc/kfi_trace
after the trace is complete.
Tools are supplied to convert numeric trace data to kernel symbols, and to process and analyze the data in a KFI trace.
Basic Use
Documentation for KFI is available (as of 2.6.11) in Documentation/kfi.txt, after applying the kfi-2.patch.
Here's a presentation about KFI usage:
- https://elinux.org/images//8/83/Pdf.gif "Learning the Kernel and Finding Performance Problems with KFI" https://elinux.org/images/d/da/Info_circle.png
- attachment:omap-serial_init.trace.txt - Sample trace used with presentation
For prior releases of KFI, see KFI Docs
Download
Patches
- Patch for CELF kernel (based on linux-2.4.20): attachment:kfi-24-test4.patch
- Patch for Linux 2.6.7 (for x86 only): attachment:kfi-26-test1.patch
- Patch for Linux 2.6.8.1: see the PatchArchive page
- Patch for Linux 2.6.11: see the PatchArchive page (or just download attachment:kfi-2.patch)
KFI utilities
- User-space programs: attachment:kfi-0.8.tar.gz (for KFI version 1)
For KFI version 2 and above, the only user-space programs are scripts, which are now located in the kernel scripts
directory:
- addr2sym - script to convert function addresses to symbols in the trace data
- kd - kfi dump - does filtering, sorting, and analysis of KFI trace logs
See Documentation/kfi.txt for instructions on using these programs.
How To Use
- download both the patch
- apply the patch in the kernel top-level directory:
- patch -p1 <kfi-2.patch
- read the rest of the instructions in the Documentation/kfi.txt file. (my apologies for being lazy!)
Adding platform support for the kfi clock source
The current patch (from Sep 2004), uses sched_clock() as the clock source for kfi_readclock(). sched_clock() is new in the 2.6 kernel, and returns a 64-bit value containing nanoseconds (not necessarily relative to any particular time base, but assumed to be monotonically increasing, and relatively frequency-stable.)
If your platform has good support for sched_clock(), then KFI should work for you unmodified. If not, you may wish to do one of two things:
- improve support for sched_clock() in your board port, or
- write a custom kfi_readclock() routine.
A "good" sched_clock() routine will provide at least microsecond resolution on return values. Some architectures have sched_clock() returning values based on the jiffy
variable, which on many embedded platforms only has resolution to 10 milliseconds.
There are some sample custom kfi_readclock() routines in the current patch (one for x86 using the TSC, and one for PPC using the TBU.
Issues
Here is a list of things that need more work:
- may need to add noinstrument attributes for some time-critical code (need to check this)
- maybe can check "Function Trace in KDB" patch for help with this
- would like a tutorial on the configuration language for defining a tracing run
- documentation needs lots of work
- should especially document how to do a dynamic trace
Overhead
Mitsubishi measured the overhead of KFI. The period is from start_kernel() to smp_init(). Platform was: SH7751R 240MHz (Memory Clock 80MHz)
With KFI : 922.419 msec
Without KFI : 666.982 msec
Overhead : 27.69%
Similar technologies
There are other technologies for doing call traces or kernel profiling that are similar to KFI. Some of these are mentioned on the KernelInstrumentation page.
One that is very similar is a kernel trace mechanism for use with KDB. A patch was posted to LKML in January of 2002. See the message: http://www.uwsg.iu.edu/hypermail/linux/kernel/0201.3/0888.html
Filter Q&A
Tim asked the question:
Q. Is there a way to adjust the trigger or filters to reduce the memory usage?
A. Todd Poynor from MontaVista answered:
kfistatic.conf
begin
trigger start entry start_kernel
trigger stop entry to_userspace
filter mintime 1
filter maxtime 0
# filter noints
end
The above filters out only those routines that take less than 1 microsecond. We usually are not interested in routines that execute so quickly, and instead use something like "filter mintime 500" to filter out routines taking less than 500 microseconds.
I didn't track down the original log file being discussed, but if the "quiet" command line parameter wasn't used then even a 500us filter may include a lot of calls for serial console printks.
The filters don't affect memory usage so far as I understand. You can set the amount of memory used for a static run by specifying "logsize <n>", where <n> is the number of entries, in kfistatic.conf.
Q. Is it possible to specifically omit certain routines with a filter.
A. I believe there's a filter for including only certain routines (rather than excluding certain routines). It shouldn't normally be necessary, but if there's a routine that matches the time filtering suggested above and is called so often as to be a problem, then adding attribute "__noinstrument" to the function definition and recompiling will exclude it; see drivers/char/kfi.c for an example.
Sample results
Here is an excerpt from a KFI log trace (processed with addr2sym). It shows all functions which lasted longer than 500 microseconds, from when the kernel entered start_kernel() to when it entered to_userspace().
kfi log output (excerpt)
Kernel Instrumentation Run ID 0
Logging started at 6785045 usec by entry to function start_kernel
Logging stopped at 8423650 usec by entry to function to_userspace
Filters:
500 usecs minimum execution time
Filter Counters:
Execution time filter count = 896348
Total entries filtered = 896348
Entries not found = 24
Number of entries after filters = 1757
Entry Delta PID Function Called At
1 0 0 start_kernel L6+0x0
14 8687 0 setup_arch start_kernel+0x35
39 891 0 setup_memory setup_arch+0x2a8
53 872 0 register_bootmem_low_pages setup_memory+0x8f
54 871 0 free_bootmem register_bootmem_low_pages+0x95
54 871 0 free_bootmem_core free_bootmem+0x34
930 7432 0 paging_init setup_arch+0x2af
935 7427 0 zone_sizes_init paging_init+0x4e
935 7427 0 free_area_init zone_sizes_init+0x83
935 7427 0 free_area_init_node free_area_init+0x4b
935 3759 0 __alloc_bootmem_node free_area_init_node+0xc5
935 3759 0 __alloc_bootmem_core __alloc_bootmem_node+0x43
4694 3668 0 free_area_init_core free_area_init_node+0x75
4817 3535 0 memmap_init_zone free_area_init_core+0x2bd
8807 266911 0 time_init start_kernel+0xb6
8807 261404 0 get_cmos_time time_init+0x1c
270211 5507 0 select_timer time_init+0x41
270211 5507 0 init_tsc select_timer+0x45
270211 5507 0 calibrate_tsc init_tsc+0x6c
275718 1638 0 console_init start_kernel+0xbb
275718 1638 0 con_init console_init+0x59
275954 733 0 vgacon_save_screen con_init+0x288
277376 6730 0 mem_init start_kernel+0xf8
277376 1691 0 free_all_bootmem mem_init+0x52
277376 1691 0 free_all_bootmem_core free_all_bootmem+0x24
284118 25027 0 calibrate_delay start_kernel+0x10f
293860 770 0 __delay calibrate_delay+0x62
293860 770 0 delay_tsc __delay+0x26
294951 1534 0 __delay calibrate_delay+0x62
294951 1534 0 delay_tsc __delay+0x26
297134 1149 0 __delay calibrate_delay+0xbe
297134 1149 0 delay_tsc __delay+0x26
.
.
.
1638605 0 145 filemap_nopage do_no_page+0xef
1638605 0 145 __lock_page filemap_nopage+0x286
1638605 0 145 io_schedule __lock_page+0x95
1638605 0 145 schedule io_schedule+0x24
1638605 0 5 schedule worker_thread+0x217
1638605 0 1 to_userspace init+0xa6
The log is attached here: attachment:kfiboot-9.lst
A Delta value of 0 usually means the exit from the routine was not seen.
kfi log analysis with 'kd'
Below is a kd
dump of the data from the above log.
For the purpose of finding areas of big time in the kernel, the functions with high "Local" time are important. For example, delay_tsc()
is called 156 times, resulting in 619 milliseconds of duration. Other time-consuming routines were: isapnp_isolate()
, get_cmos_time()
, default_idle()
.
The top line showing schedule() called 192 times and lasting over 5 seconds, is accounted wrong due to the switch in execution control inside the schedule routine. (The count of 192 calls is correct, but the duration is wrong.)
$~/work/kfi/kfi/kd -n 30 kfiboot-9.lst
Function
Count
Time
Average
Local
schedule
192
5173790
26946
5173790
do_basic_setup
1
1159270
1159270
14
do_initcalls
1
1159256
1159256
627
__delay
156
619322
3970
0
delay_tsc
156
619322
3970
619322
__const_udelay
146
608427
4167
0
probe_hwif
8
553972
69246
126
do_probe
31
553025
17839
68
ide_delay_50ms
103
552588
5364
0
isapnp_init
1
383138
383138
18
isapnp_isolate
1
383120
383120
311629
ide_init
1
339778
339778
22
probe_for_hwifs
1
339756
339756
103
ide_scan_pcibus
1
339653
339653
13
init_setup_piix
2
339640
169820
0
ide_scan_pcidev
2
339640
169820
0
piix_init_one
2
339640
169820
0
ide_setup_pci_device
2
339640
169820
242
probe_hwif_init
4
339398
84849
40
time_init
1
266911
266911
0
get_cmos_time
1
261404
261404
261404
ide_generic_init
1
214614
214614
0
ideprobe_init
1
214614
214614
0
wait_for_completion
6
194573
32428
0
default_idle
183
192589
1052
192589
io_schedule
18
171313
9517
0
__wait_on_buffer
14
150369
10740
141
i8042_init
1
137210
137210
295
i8042_port_register
2
135318
67659
301
__serio_register_port
2
135017
67508
0