Difference between revisions of "Measuring Function Duration with FTrace"

From eLinux.org
Jump to: navigation, search
(Add category)
 
(3 intermediate revisions by one other user not shown)
Line 51: Line 51:
 
*** implement a trampoline
 
*** implement a trampoline
 
*** details of the ARM trampoline
 
*** details of the ARM trampoline
 
=== measuring startup ===
 
  
 
== Using FTrace ==
 
== Using FTrace ==
 
* example of use
 
* example of use
  
== interpreting the results ==
+
=== interpreting the results ===
 
* cat ftrace
 
* cat ftrace
 
** trace sample
 
** trace sample
  
 +
* using kd to analyze data
 +
** measuring function counts
  
== performance issues
+
 
 +
== Measuring kernel boot ==
 +
* requirements for early boot
 +
** early clock
 +
** start location for tracing (optimally start_kernel)
 +
** static buffer
 +
** static parameters
 +
 
 +
== performance issues ==
 
* overhead of tracing
 
* overhead of tracing
 
** mention of dynamic tracing (not detailed here)
 
** mention of dynamic tracing (not detailed here)
 
* benchmark with and without function tracing?
 
* benchmark with and without function tracing?
 +
 +
== Roadmap and future work ==
 +
* what has not been mainlined (as of 5/09)?
  
 
== Things to do/add ==
 
== Things to do/add ==
* ARM exit tracing - done (need to test, see below)
 
* better clock on ARM
 
* duration filtering
 
* kd for analysis (port kd)
 
 
* use in early boot
 
* use in early boot
* send patch for ARM exit tracing
+
* mainline patches (again)
 
+
* fix ftd bugs
* find out if all trace data is collected correctly
+
** fix negative local-time bug (?? - what is status?)
** ftrace data on ARM seems to be missing a lot of functions
+
*** how to determine if functions are missing?
+
**** port KFT to 2.6.29, and use to check ftrace data collection?
+
 
+
* get a better clock routine.
+
** Current on has max resolution of 31 microseconds.
+
** what is resolution of KFT clock on ARM?
+
  
 
* measure early startup
 
* measure early startup
** determine how early the clock is available
+
** use an early-available clock source
 +
*** determine how early the clock is available
 +
** use a statically-declared ring buffer
 +
*** OR, use something besides a ring buffer temporarily during boot, then
 +
*** transfer the data??
 +
** use statically-declared trace parameters
 +
 
 +
== Things done ==
 +
* ARM function graph tracing - done
 +
* better clock on ARM - done - used MPU_TIMER instead of 32k timer
 +
* fix trace recursion on arm (notrace on sched_clock paths) - done
 +
* duration filtering - done
 +
* optimize ring_buffer_discard operation for function entry - done
 +
* convert kd to ftd - started, some issues remaining
  
* use duration filtering
+
[[Category:Development Tools]]
** get latest ftrace code, and apply patches for filtering
+
** see if duration filtering is available
+

Latest revision as of 08:26, 28 October 2011

Author

Tim Bird
Sony Corporation of America
tim dot bird (at) am dot sony dot com

Abstract

FTrace is a relatively new kernel tool for tracing function execution in the Linux kernel. Recently, FTrace supports the ability to trace function exit in addition to function entry. This allows for measurement of function duration, which adds an incredibly powerful tool for finding time-consuming areas of kernel execution. In this paper, the current state of the art for measuring function duration with FTrace is described. This includes recent work to add a new capability to filter the trace data by function duration, and tools for analyzing kernel function call graphs and visualizing kernel boot time execution.

Introduction

Overview of FTrace operation

instrumentation at compile time

  • instrumentation of the code using -pg
    • insertion of calls in each function prologue to 'mcount'
    • architecture-specific mcount routine
      • new mcount routine (_gnu_mcount) for ARM??
      • must use frame-pointers on ARM (can't use -fomit-frame-pointers)
      • Not sure it works with EABI (but assembly analysis indicates that all mcount callers have well-defined frames)
  • ability to manually instrument with markers??
    • out of scope

trace setup at run-time

  • pseudo file system
    • selecting a tracer
      • trace functions
        • stub, others
      • tracing parameters
  • collecting trace data
    • cat

tracing operation

  • ring buffer
    • generic mechanism for buffering time-stamped events, per-cpu
    • no other details here.
  • trace functions
    • ring operations
      • reservation, data fillout, commit?

function duration tracer

  • Measuring function duration
    • requires instrumenting function exit
      • option to use -finstrument-functions, which instruments both function entry and function exit
        • conflicts with mechanisms for dynamic ftrace instrumentation
    • hooking function exit using only mcount
      • implement a trampoline
      • details of the ARM trampoline

Using FTrace

  • example of use

interpreting the results

  • cat ftrace
    • trace sample
  • using kd to analyze data
    • measuring function counts


Measuring kernel boot

  • requirements for early boot
    • early clock
    • start location for tracing (optimally start_kernel)
    • static buffer
    • static parameters

performance issues

  • overhead of tracing
    • mention of dynamic tracing (not detailed here)
  • benchmark with and without function tracing?

Roadmap and future work

  • what has not been mainlined (as of 5/09)?

Things to do/add

  • use in early boot
  • mainline patches (again)
  • fix ftd bugs
    • fix negative local-time bug (?? - what is status?)
  • measure early startup
    • use an early-available clock source
      • determine how early the clock is available
    • use a statically-declared ring buffer
      • OR, use something besides a ring buffer temporarily during boot, then
      • transfer the data??
    • use statically-declared trace parameters

Things done

  • ARM function graph tracing - done
  • better clock on ARM - done - used MPU_TIMER instead of 32k timer
  • fix trace recursion on arm (notrace on sched_clock paths) - done
  • duration filtering - done
  • optimize ring_buffer_discard operation for function entry - done
  • convert kd to ftd - started, some issues remaining