Please note that User Registration has been temporarily disabled due to a recent increase in automated registrations. If anyone needs an account, please request one here: RequestAccount. Thanks for your patience!--Wmat (talk)
Please email User:Wmat if you experience any issues with the Request Account form.

Difference between revisions of "Improving Android Boot Time Outline"

From eLinux.org
Jump to: navigation, search
(Created page with 'Here is the outline for my talk: * Title * Outline ** Android boot overview ** Measuring boot times ** Problem areas *** Some gory details ** Ideas for improvements * Android boo…')
 
m (Add category)
 
(7 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
Here is the outline for my talk:
 
Here is the outline for my talk:
 
* Title
 
* Title
* Outline
+
----
** Android boot overview
+
* Outline  
 +
** Android boot sequence overview
 
** Measuring boot times
 
** Measuring boot times
 
** Problem areas
 
** Problem areas
 
*** Some gory details
 
*** Some gory details
 
** Ideas for improvements
 
** Ideas for improvements
* Android boot overview
+
----
 +
* Boot Sequence Overview
 
** bootloader
 
** bootloader
 
** kernel
 
** kernel
 
** init
 
** init
 +
*** loads several daemons and services, including zygote
 +
*** see /init.rc and init.''<platform>''.rc
 
** zygote
 
** zygote
*** building preload heap
+
*** preloads classes
*** start package manager
+
*** starts package manager
 
** service manager
 
** service manager
 
*** start services
 
*** start services
* measuring bootup time
+
[ diagram would be nice ]
** systems measured: adp1, n1, evm
+
----
*** adp1 with donut
+
* Measuring Boot Time
*** n1 with eclair
+
* Systems measured:
*** evm with eclair
+
*** Android Developer Phone (same hardware as G1), running Donut (1.6)
**** NOTE: used nfs root filesystem (file IO timings might be bogus)
+
*** Nexus 1, running Eclair (2.1)
 +
*** OMAP Evaluation Module (EVM), running  Eclair
 +
**** NOTE: Used NFS root filesystem (which means file IO timings might be off)
 +
* Tools and techniques used (next slide)
 +
----
 
* Tools for measuring and tracing boot time
 
* Tools for measuring and tracing boot time
 
** stopwatch
 
** stopwatch
** grabserial
+
*** It's kind of sad that it takes so long that you can use a stopwatch
** printk times
+
** Message loggers
 +
*** grabserial
 +
*** printk times
 +
*** logcat
 
** bootchart
 
** bootchart
 
** strace
 
** strace
** logcat
 
 
** method tracer*
 
** method tracer*
 
** ftrace*
 
** ftrace*
 +
----
 +
* Grabserial
 +
** Tool for measuring time of printouts on a serial port, from a host machine
 +
*** Only useful with EVM board, which has serial console
 +
** Shows timestamp for each line received over serial console
 +
** See [[Grabserial]]
 +
----
 +
* Printk Times
 +
** Kernel option for adding time stamp to each printk
 +
*** Set CONFIG_PRINTK_TIME=y
 +
**** Option is on "Kernel hacking" menu, "Show timing information on printks"
 +
** Can save from terminal on serial console, on host
 +
** Can also show after boot with 'dmesg'
 +
** Useful to use 'initcall_debug' on kernel command line
 +
** See [[Printk Times]]
 +
** Can change loglevel of 'init' program
 +
*** Change "loglevel 3" to "loglevel 8" in /init.rc
 +
----
 +
* Bootchart
 +
** 'init' gathers data on startup
 +
*** Must re-compile 'init' with support for bootchart data collection
 +
** A tool on the host produces a nice graphic
 +
** See [[Bootchart]] and [[Using Bootchart on Android]]
 +
----
 +
* strace
 +
** Shows system calls for a process (or set of processes)
 +
** Is part of AOSP since Eclair
 +
** Can add to init.rc to trace initialization.
 +
*** For example, to trace zygote startup, in /init.rc change:
 +
    service zygote /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
 +
to
 +
    service zygote /system/xbin/strace -tt -o/data/boot.strace /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
 +
----
 +
* Android system log
 +
** Android's has built-in logging system
 +
** Use logcat to see messages
 +
** I added extra instrumentation for class preloading and package scanning
 +
*** PARSE_CHATTY flag for package scanning
 +
** I built my own tool (logdelta) to scan log and produce 'delta' times
 +
----
 +
* method tracer*
 +
** method traces is built in to
 +
* ftrace*
 +
** I could have really used ftrace for some things
 +
** (especially to see page faults intermingled with system calls)
 +
** kernel version for my primary development board (2.6.29) didn't support it
 +
** should be usable in future versions of Android (Froyo is at 2.6.32)
 +
** NOTE: ARM is missing function graph tracing - see [[Ftrace Function Graph ARM]]
 +
----
 +
* Measurement results
 +
** stopwatch
 +
** grabserial
 +
** printk times
 +
** strace
 +
** bootchart
 +
** logcat
 +
---
 
* stopwatch
 
* stopwatch
 +
** ADP1: 4, 32, 57 | 39 apps
 +
** Nexus 1: 3.5, 20, 36 | 79 apps
 +
** EVM: 17, 37, 62 | 45 apps (very small)
 +
---
 
* grabserial
 
* grabserial
 +
---
 
* printk times
 
* printk times
* bootchart
+
---
 
* strace
 
* strace
 +
---
 +
* bootchart
 +
---
 
* logcat
 
* logcat
** extra instrumentation for preloading classes
+
---
** PARSE_CHATTY flag for package scanning
+
** mention my own tool 'logdelta'
+
* method tracer*
+
** method traces is built in to
+
** ftrace?? (no)
+
 
* Problem Areas
 
* Problem Areas
 
** First, a bootchart for EVM board
 
** First, a bootchart for EVM board
Line 50: Line 120:
 
** package scanning
 
** package scanning
 
** service initialization
 
** service initialization
 +
----
 
* bootloader init
 
* bootloader init
 
** outside scope of this talk
 
** outside scope of this talk
 
** didn't measure commercial bootloader, only development one (U-boot)
 
** didn't measure commercial bootloader, only development one (U-boot)
 +
----
 
* kernel init
 
* kernel init
 
** is mostly the usual suspects
 
** is mostly the usual suspects
 
** (initcall_debug results)
 
** (initcall_debug results)
 
** USB
 
** USB
 +
----
 
* zygote class preloading
 
* zygote class preloading
 
** zygote pre-loads just under 2000 classes, and instantiates them in its heap
 
** zygote pre-loads just under 2000 classes, and instantiates them in its heap
 
** controlled by file:
 
** controlled by file:
 +
----
 
* package manager package scan
 
* package manager package scan
** exact purpose is not known
 
*** validation of certificates, permissions, capabilities and dependencies, etc.?
 
 
** EVERY package is scanned at boot time
 
** EVERY package is scanned at boot time
* ideas for enhancements
+
** Very deep nesting, with abstraction
 +
** Not sure of exact set of purposes
 +
*** But I see validation of certificates, permissions, capabilities and dependencies, etc.
 +
** Very difficult to trace
 +
*** It bounces between java, c++ and kernel
 +
*** And uses mmaped files (meaning accesses cause page faults)!!
 +
**** So it's not even using syscalls for reading the data
 +
----
 +
* Package scan call tree
 +
[put call tree here]
 +
 
 +
----
 +
* parseZipArchive()
 +
** evil routine that builds an in-memory data structure for accessing a package file
 +
** scans the entire package, checking the content headers
 +
*** caused read of almost entire package
 +
*** touches every page in mmaped file, even if a sub-file in the archive won't be read later
 +
*** e.g. entire package is scanned, when only the AndroidManifest.xml file is requested
 +
----
 +
* new Resources()
 +
** reads all resources in a file
 +
** is this really necessary?
 +
----
 +
* Ideas for Enhancements
 +
** First, a side note on toothpaste..
 
** kernel speedups
 
** kernel speedups
 
** optimize package scan
 
** optimize package scan
Line 70: Line 166:
 
** miscellaneous optimizations
 
** miscellaneous optimizations
 
** sreadahead??
 
** sreadahead??
 +
----
 +
* Toothpaste
 +
** Problem with optimizations are that reduction in one area causes some other problem (either in speed or size) in some other area
 +
** i.e. when you squeeze the tube of toothpaste, it just moves somewhere else, and doesn't actually come out the end
 +
** This is demonstrated with class preloading and page cache effects
 +
*** I tried to improve things, but the I/O delays just moved somewhere else in the system, sometimes making things worse
 +
*** e.g. AutoText - eliminated and gained 4 seconds during class preloading, but /system/frameworks/frameworks-res.apk was just loaded later in the boot
 +
*** e.g. Contacts.apk - moved AndroidManifest.xml to its own package, to avoid reading the entire (1.6M) package to read this one file, but next reference to contents of Contacts.apk caused the index rebuild again (costing the entire page cache load)
 +
----
 
* kernel speedups
 
* kernel speedups
 
** outside the scope of this presentation
 
** outside the scope of this presentation
 
** see http://elinux.org/Boot_Time
 
** see http://elinux.org/Boot_Time
* optimize package scan
+
** should really be able to get kernel up in 1 second
*** most definitely
+
*** modulo network delays
*** need to continue analysis
+
----
**** most likely, should switch to a compressed flash file system
+
* Optimize Class Preloading
** use pre-constructed dalvik heap? (difficult?)
+
** Preload less, and let apps pay penalty for shared class and resource use
*** thread the heap construction?
+
*** move some classes to services, and have preloaded class be an accessor stub
* sreadahead??
+
*** figure out how to share heap back with zygote
 +
*** This needs a lot of analysis - Google Android devs know that the whole process of selecting what classes to preload is a black art
 +
** Thread the heap construction
 +
*** There is some evidence that class preloading has I/O waits, and would benefit from threading
 +
*** Don't know if this is possible
 +
*** NOTE: all threads need to terminate before spawning Android apps
 +
** Use pre-constructed dalvik heap
 +
*** (more on next slide)
 +
----
 +
* Use pre-constructed dalvik heap
 +
** Basic operation:
 +
*** Snapshot the heap at end of preloading
 +
*** Check for modifications to any class in preload list, and do regular preload
 +
*** Otherwise, load pre-constructed heap
 +
** Issues:
 +
*** Don't know if this is possible
 +
**** Need to find parts of heap that are identical on each boot
 +
**** Probably need separate "always-the-same" and "may-change" classes
 +
*** Needs careful analysis, and knowledge of each class
 +
----
 +
* Optimize Package Scan
 +
** Most definitely! should be first thing attacked
 +
** Need to continue analysis
 +
*** Most likely, should switch to a compressed flash file system
 +
----
 +
* Miscellaneous
 +
** zoneinfo inefficiences
 +
*** discovered with strace
 +
*** routines that do read syscall for 40 bytes, 8 bytes, 8 bytes (hundreds of times)
 +
**** no buffering at user level, sloppy loop coding
 +
*** linear scan of timezone file
 +
**** for a file not present!!
 +
*** probably only a few hundred milliseconds, but worth changing
 +
----
 +
* readahead??
 +
** Interesting result from just pre-filling the page cache
 +
[put page cache sequence here]
 
** could use sreadahead to pre-fill page cache
 
** could use sreadahead to pre-fill page cache
 
** however, this just masks bad behavior
 
** however, this just masks bad behavior
 
*** Contacts.apk (half of 1.6M) is read 4 times! during boot
 
*** Contacts.apk (half of 1.6M) is read 4 times! during boot
*** filling page cache makes reads after first one fast, but it would be better to avoid all these reads altogether
+
*** filling page cache makes reads after first one fast, but it would be better to avoid (most of) the reads altogether
 
**** better to just optimize or eliminate parseZipArchive()
 
**** better to just optimize or eliminate parseZipArchive()
 
** sreadahead should be used dead last (after all other enhancements)
 
** sreadahead should be used dead last (after all other enhancements)
 +
----
 +
* Conclusions
 +
** Sorry - no speedups yet
 +
** But, have a good foundation and set of tools for improving things going forward
 +
----
 +
* Observations
 +
** Premature optimization is the root of all evil
 +
*** Be very careful of optimizing wasteful operations
 +
*** Better to improve or eliminate the operations, than hide the wasteful operations with caching
 +
** Beware of systemic or architectural problems
 +
*** Package management basically builds a persistent container and compression architecture in user space
 +
*** Except, it does it poorly.  (It rebuilds the in-memory data structure for indexing an archive over and over.)
 +
*** Just use a file system, for heaven's sake!
 +
----
 +
* Resources
 +
** Wiki page for this talk: http://elinux.org/Improving_Android_Boot_Time
 +
** Android Porting, Android Platform, and Android Kernel mailing lists, depending on where your issue is
 +
*** See http://elinux.org/Android_Web_Resources#Mailing_Lists
 +
** My e-mail: tim(dot)bird (at) am(dot)sony(dot)com
  
      - don't know how to preconstruct heap
+
[[Category:Android]]
      - need to analyze heap
+
        - FIXTHIS - is heap identical on every boot?
+
          - how to dump heap memory
+
          - how to dump any memory in Android??
+
        - are parts identical??
+
    - zoneinfo inefficiencies
+
        - measure without strace, and see if time is a problem
+
  - remaining questions
+
    - future directions
+

Latest revision as of 21:36, 27 October 2011

Here is the outline for my talk:

  • Title

  • Outline
    • Android boot sequence overview
    • Measuring boot times
    • Problem areas
      • Some gory details
    • Ideas for improvements

  • Boot Sequence Overview
    • bootloader
    • kernel
    • init
      • loads several daemons and services, including zygote
      • see /init.rc and init.<platform>.rc
    • zygote
      • preloads classes
      • starts package manager
    • service manager
      • start services

[ diagram would be nice ]


  • Measuring Boot Time
  • Systems measured:
      • Android Developer Phone (same hardware as G1), running Donut (1.6)
      • Nexus 1, running Eclair (2.1)
      • OMAP Evaluation Module (EVM), running Eclair
        • NOTE: Used NFS root filesystem (which means file IO timings might be off)
  • Tools and techniques used (next slide)

  • Tools for measuring and tracing boot time
    • stopwatch
      • It's kind of sad that it takes so long that you can use a stopwatch
    • Message loggers
      • grabserial
      • printk times
      • logcat
    • bootchart
    • strace
    • method tracer*
    • ftrace*

  • Grabserial
    • Tool for measuring time of printouts on a serial port, from a host machine
      • Only useful with EVM board, which has serial console
    • Shows timestamp for each line received over serial console
    • See Grabserial

  • Printk Times
    • Kernel option for adding time stamp to each printk
      • Set CONFIG_PRINTK_TIME=y
        • Option is on "Kernel hacking" menu, "Show timing information on printks"
    • Can save from terminal on serial console, on host
    • Can also show after boot with 'dmesg'
    • Useful to use 'initcall_debug' on kernel command line
    • See Printk Times
    • Can change loglevel of 'init' program
      • Change "loglevel 3" to "loglevel 8" in /init.rc

  • Bootchart
    • 'init' gathers data on startup
      • Must re-compile 'init' with support for bootchart data collection
    • A tool on the host produces a nice graphic
    • See Bootchart and Using Bootchart on Android

  • strace
    • Shows system calls for a process (or set of processes)
    • Is part of AOSP since Eclair
    • Can add to init.rc to trace initialization.
      • For example, to trace zygote startup, in /init.rc change:
   service zygote /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
to
   service zygote /system/xbin/strace -tt -o/data/boot.strace /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server

  • Android system log
    • Android's has built-in logging system
    • Use logcat to see messages
    • I added extra instrumentation for class preloading and package scanning
      • PARSE_CHATTY flag for package scanning
    • I built my own tool (logdelta) to scan log and produce 'delta' times

  • method tracer*
    • method traces is built in to
  • ftrace*
    • I could have really used ftrace for some things
    • (especially to see page faults intermingled with system calls)
    • kernel version for my primary development board (2.6.29) didn't support it
    • should be usable in future versions of Android (Froyo is at 2.6.32)
    • NOTE: ARM is missing function graph tracing - see Ftrace Function Graph ARM

  • Measurement results
    • stopwatch
    • grabserial
    • printk times
    • strace
    • bootchart
    • logcat

---

  • stopwatch
    • ADP1: 4, 32, 57 | 39 apps
    • Nexus 1: 3.5, 20, 36 | 79 apps
    • EVM: 17, 37, 62 | 45 apps (very small)

---

  • grabserial

---

  • printk times

---

  • strace

---

  • bootchart

---

  • logcat

---

  • Problem Areas
    • First, a bootchart for EVM board
    • bootloader init
    • kernel init
    • zygote class preloading
    • package scanning
    • service initialization

  • bootloader init
    • outside scope of this talk
    • didn't measure commercial bootloader, only development one (U-boot)

  • kernel init
    • is mostly the usual suspects
    • (initcall_debug results)
    • USB

  • zygote class preloading
    • zygote pre-loads just under 2000 classes, and instantiates them in its heap
    • controlled by file:

  • package manager package scan
    • EVERY package is scanned at boot time
    • Very deep nesting, with abstraction
    • Not sure of exact set of purposes
      • But I see validation of certificates, permissions, capabilities and dependencies, etc.
    • Very difficult to trace
      • It bounces between java, c++ and kernel
      • And uses mmaped files (meaning accesses cause page faults)!!
        • So it's not even using syscalls for reading the data

  • Package scan call tree

[put call tree here]


  • parseZipArchive()
    • evil routine that builds an in-memory data structure for accessing a package file
    • scans the entire package, checking the content headers
      • caused read of almost entire package
      • touches every page in mmaped file, even if a sub-file in the archive won't be read later
      • e.g. entire package is scanned, when only the AndroidManifest.xml file is requested

  • new Resources()
    • reads all resources in a file
    • is this really necessary?

  • Ideas for Enhancements
    • First, a side note on toothpaste..
    • kernel speedups
    • optimize package scan
    • optimize class preloading
    • miscellaneous optimizations
    • sreadahead??

  • Toothpaste
    • Problem with optimizations are that reduction in one area causes some other problem (either in speed or size) in some other area
    • i.e. when you squeeze the tube of toothpaste, it just moves somewhere else, and doesn't actually come out the end
    • This is demonstrated with class preloading and page cache effects
      • I tried to improve things, but the I/O delays just moved somewhere else in the system, sometimes making things worse
      • e.g. AutoText - eliminated and gained 4 seconds during class preloading, but /system/frameworks/frameworks-res.apk was just loaded later in the boot
      • e.g. Contacts.apk - moved AndroidManifest.xml to its own package, to avoid reading the entire (1.6M) package to read this one file, but next reference to contents of Contacts.apk caused the index rebuild again (costing the entire page cache load)

  • kernel speedups
    • outside the scope of this presentation
    • see http://elinux.org/Boot_Time
    • should really be able to get kernel up in 1 second
      • modulo network delays

  • Optimize Class Preloading
    • Preload less, and let apps pay penalty for shared class and resource use
      • move some classes to services, and have preloaded class be an accessor stub
      • figure out how to share heap back with zygote
      • This needs a lot of analysis - Google Android devs know that the whole process of selecting what classes to preload is a black art
    • Thread the heap construction
      • There is some evidence that class preloading has I/O waits, and would benefit from threading
      • Don't know if this is possible
      • NOTE: all threads need to terminate before spawning Android apps
    • Use pre-constructed dalvik heap
      • (more on next slide)

  • Use pre-constructed dalvik heap
    • Basic operation:
      • Snapshot the heap at end of preloading
      • Check for modifications to any class in preload list, and do regular preload
      • Otherwise, load pre-constructed heap
    • Issues:
      • Don't know if this is possible
        • Need to find parts of heap that are identical on each boot
        • Probably need separate "always-the-same" and "may-change" classes
      • Needs careful analysis, and knowledge of each class

  • Optimize Package Scan
    • Most definitely! should be first thing attacked
    • Need to continue analysis
      • Most likely, should switch to a compressed flash file system

  • Miscellaneous
    • zoneinfo inefficiences
      • discovered with strace
      • routines that do read syscall for 40 bytes, 8 bytes, 8 bytes (hundreds of times)
        • no buffering at user level, sloppy loop coding
      • linear scan of timezone file
        • for a file not present!!
      • probably only a few hundred milliseconds, but worth changing

  • readahead??
    • Interesting result from just pre-filling the page cache

[put page cache sequence here]

    • could use sreadahead to pre-fill page cache
    • however, this just masks bad behavior
      • Contacts.apk (half of 1.6M) is read 4 times! during boot
      • filling page cache makes reads after first one fast, but it would be better to avoid (most of) the reads altogether
        • better to just optimize or eliminate parseZipArchive()
    • sreadahead should be used dead last (after all other enhancements)

  • Conclusions
    • Sorry - no speedups yet
    • But, have a good foundation and set of tools for improving things going forward

  • Observations
    • Premature optimization is the root of all evil
      • Be very careful of optimizing wasteful operations
      • Better to improve or eliminate the operations, than hide the wasteful operations with caching
    • Beware of systemic or architectural problems
      • Package management basically builds a persistent container and compression architecture in user space
      • Except, it does it poorly. (It rebuilds the in-memory data structure for indexing an archive over and over.)
      • Just use a file system, for heaven's sake!