System Size Auto-Reduction

This page has notes and an outline for Tim Bird's Linux Auto-Reduction research.

Title
Advanced size optimization of the Linux kernel

Abstract
This presentation will cover recent research by Tim on aggressive size reduction of the Linux kernel. This will include results from using gcc link-time optimization (LTO) with the ARM architecture (using Andi Kleen's out-of-tree patches), as well as results and discussion of other optimization techniques (including whole-system optimization for embedded devices).

This talk is directed at kernel developers interested in reducing the size of their Linux systems (and possible improving their performance in the process). The talk will be highly technical.

LTO

 * What is it?
 * what was required to get it to work?
 * Andi Kleen's patch set
 * what do they do?
 * how big are they?
 * mainline status?
 * what is the size gain (see ELC poster)
 * what can be done with it?
 * long-term possibilities for LTO

global constraints

 * overall idea: create constraints external to code, and use for optimization
 * rationale: can't maintain in-tree - too many config items
 * make the application of constraints automatic
 * use existing constraints to generate new constraints
 * constraints can flow between user-space and kernel


 * example: uid=0
 * constraint language
 * application by commenting out references (replace with 0 constant)
 * use compiler to find code references (via error messages)
 * eliminates problem with duplicate names (uid in different structure)
 * constant propagation (by, e.g. LTO) reduces code

syscall elimination

 * scan file system
 * create report of used and unused system calls
 * mark syscalls unused in kernel
 * arch/arm/kernel/calls.S (and arch/arm/kernel/entry-common.S
 * make sure unused syscalls are not __attribute__(externally_visible)
 * technique of asmlinkage_
 * use LTO to eliminate calls
 * results: 50K-90K

ARM stack reduction

 * 4k stacks
 * stack extensions

cold code compression

 * D. Chanet did cold code compression
 * consists of:
 * profiling the kernel
 * marking code regions as cold or frozen
 * replacing them with stubs
 * compressing them
 * At execution time:
 * if a stub is called, it decompresses the code and calls it
 * stub is fixed up to directly call decompressed code in future
 * code is left decompressed forever

cold code compression
Results:
 * MUST see paper for details (it's quite complicated)
 * on 2.4.25 kernel
 * cold code compression resulted in 7% reduction for i386 kernel and
 * 11.7% reduction for ARM kernel

Talk outline
This talk will be presented at LinuxCon Japan 2013:

Title

 * Advanced size optimization of the Linux kernel
 * by Tim Bird, Sony Mobile Communication

Self-Introduction

 * I am Tim Bird
 * Now working at Sony Mobile
 * Researching system size for many years
 * Long background in extremely small systems
 * pre-professional: first program on TRS-80, in basic, 8K ram
 * NetWare Lite - file and print server in 50K (in 1991)

The problem of Bloat

 * Software bloat occurs because systems are built with more software than is really needed for a given task
 * Open Source software meets the needs of thousands of different systems
 * Linux scales from tiny sensors to supercomputers (extreme SMP and high-end clusters)
 * Linux supports many, many features, only some of which are configurable
 * Software must be generalized for many use cases
 * bloat problem is:
 * How to re-specialize the software, eliminating unused features and dead code?

Bloat (cont.)

 * Software gets more generalized over time
 * Can't use strategy of manual tuning (config options)
 * It gets harder and harder to remove things over time
 * About 13,000 config items now (2.6.12 had 4700)
 * You have to be an expert in too many things to reduce the kernel
 * Must rely on automated methods of reduction
 * Should use an additive, rather than subtractive method of building a system
 * ultimate vision: indicate what you want/need, and build up system to support it

Bloat (cont. 2)

 * In desktop or server, virtual memory makes bloat issue less important for user-space programs
 * Only working set of program is loaded - pages are loaded on demand
 * For kernel, all pages are always loaded

Tiny Distribution

 * poky-tiny distribution (yocto project)
 * see https://wiki.yoctoproject.org/wiki/Poky-Tiny
 * Good for testing and further research

Materials

 * [[File:0001-ARM-LTO-avoid-errors-on-unified-assembly-macros.patch]]