Difference between revisions of "Kernel dynamic memory analysis"

From eLinux.org
Jump to: navigation, search
(Mainline status)
(Add callsite section)
Line 3: Line 3:
 
[This page is fairly random at the moment...]
 
[This page is fairly random at the moment...]
  
== Instrumentation ==
+
== Instrumentation overview ==
* slab_accounting patches
+
 
** uses __builtin_return_address(0) to record the address of the caller
+
* Slab_accounting patches
*** this is same mechanism used by kmem events
+
** uses __builtin_return_address(0) to record the address of the caller, the same mechanism used by kmem events
*** if gcc decides to inline automatically, you get the wrong call site
 
**** can disable automatic inlinining with a compiler flag
 
 
** starts from very first allocation
 
** starts from very first allocation
  
* kmem events
+
* Ftrace kmem events
 
** does not start until ftrace system is initialized, after some allocations are already performed
 
** does not start until ftrace system is initialized, after some allocations are already performed
 
** supported in mainline - no need to add our own instrumentation
 
** supported in mainline - no need to add our own instrumentation
 +
 +
== Obtaining accurate call sites (or The painstaking task of wrestling against gcc) ==
 +
 +
The compiler inlines '''a lot''' automatically and without warning.
 +
In this scenario, it's impossible to get the '''real''' call site name
 +
based on just calling address.
 +
 +
When some function is inlined, it gets ''collapsed'' and it won't
 +
get listed as a symbol if you use tools like readelf, objdump, etc.
 +
 +
Does this matter? Well, it matters if you want to obtain an accurate
 +
call site report when tracing kernel memory events (which will see later).
 +
 +
However, there is one solution! You can turn off gcc inlining
 +
using an options on kernel Makefile. The option is called 'no-inline-small-functions'.
 +
See this patch:
 +
 +
diff --git a/Makefile b/Makefile
 +
index 8e4c0a7..23f1a88 100644
 +
--- a/Makefile
 +
+++ b/Makefile
 +
@@ -363,6 +363,7 @@ KBUILD_CFLAGS  := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
 +
                    -fno-strict-aliasing -fno-common \
 +
                    -Werror-implicit-function-declaration \
 +
                    -Wno-format-security \
 +
+                  -fno-inline-small-functions \
 +
                    -fno-delete-null-pointer-checks
 +
  KBUILD_AFLAGS_KERNEL :=
 +
  KBUILD_CFLAGS_KERNEL :=
 +
 +
Of course, this option makes a bit smaller and slower kernel,
 +
but this is an expected side-effect on a debug-only kernel.
 +
 +
We must keep in mind that no matter what internal mechanisms we use to record call_site,
 +
if they're based on __builtin_address, then their accuracy will depend entirely on
 +
gcc *not* inlining automatically.
 +
 +
The enfasis is in the ''automatic'' part. There will be lots of functions we will
 +
need to get inlined in order to determine the caller correctly.
 +
These will be marked as __always_inline.
 +
 +
(See upstreamed patch Makefile: Add option CONFIG_DISABLE_GCC_AUTOMATIC_INLINING)
 +
 +
  
 
Focus of work (on instrumentation) right now is to see if kmem events can be used to find early allocations.
 
Focus of work (on instrumentation) right now is to see if kmem events can be used to find early allocations.

Revision as of 14:17, 8 September 2012

This page has notes and results from the project Kernel dynamic memory allocation tracking and reduction

[This page is fairly random at the moment...]

Instrumentation overview

  • Slab_accounting patches
    • uses __builtin_return_address(0) to record the address of the caller, the same mechanism used by kmem events
    • starts from very first allocation
  • Ftrace kmem events
    • does not start until ftrace system is initialized, after some allocations are already performed
    • supported in mainline - no need to add our own instrumentation

Obtaining accurate call sites (or The painstaking task of wrestling against gcc)

The compiler inlines a lot automatically and without warning. In this scenario, it's impossible to get the real call site name based on just calling address.

When some function is inlined, it gets collapsed and it won't get listed as a symbol if you use tools like readelf, objdump, etc.

Does this matter? Well, it matters if you want to obtain an accurate call site report when tracing kernel memory events (which will see later).

However, there is one solution! You can turn off gcc inlining using an options on kernel Makefile. The option is called 'no-inline-small-functions'. See this patch:

diff --git a/Makefile b/Makefile
index 8e4c0a7..23f1a88 100644
--- a/Makefile
+++ b/Makefile
@@ -363,6 +363,7 @@ KBUILD_CFLAGS   := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
                   -fno-strict-aliasing -fno-common \
                   -Werror-implicit-function-declaration \
                   -Wno-format-security \
+                  -fno-inline-small-functions \
                   -fno-delete-null-pointer-checks
 KBUILD_AFLAGS_KERNEL :=
 KBUILD_CFLAGS_KERNEL :=

Of course, this option makes a bit smaller and slower kernel, but this is an expected side-effect on a debug-only kernel.

We must keep in mind that no matter what internal mechanisms we use to record call_site, if they're based on __builtin_address, then their accuracy will depend entirely on gcc *not* inlining automatically.

The enfasis is in the automatic part. There will be lots of functions we will need to get inlined in order to determine the caller correctly. These will be marked as __always_inline.

(See upstreamed patch Makefile: Add option CONFIG_DISABLE_GCC_AUTOMATIC_INLINING)


Focus of work (on instrumentation) right now is to see if kmem events can be used to find early allocations. Also, to see if early allocations account for significant memory usage. If not, it may not be that important to capture them. [Is another possibility some way to use a printk approach for very early allocations, and somehow coalesce the data into the final report?]

Reporting

  • extracting data to host
    • tool for extraction (perf?, cat /debugfs/tracing/<something>?)
  • post-processing the data
    • grouping allocations (assigning to different subsystems, processes, or functional areas)
      • idea to post-process kmem events and correlate with */built-in.o
    • reporting on wasted bytes
    • reporting on memory fragmentation

Visualization

  • possible use of treemap to visualize the data

Mainline status

  • is anything added to mainline via this project?

[place links to patches, or git commit ids, here]

Recommendations for reductions

Results so far (in random order)

  • There's a lot of fragmentation using the SLAB allocator. [how much?]
  • SLxB accounting is a dead-end (it won't be accepted into mainline)

more???