OMAP Power Management

PM branch
The PM branch is a developement branch of the linux-omap kernel for the purposes of developing and stabilizing the PM infrastructure for OMAP and submitting it upstream.

The maintainer of the PM branch is Kevin Hilman.

Features

 * full-chip retention in idle and suspend
 * full-chip OFF in idle and suspend
 * idle PM via CPUidle
 * active PM via DVFS using CPUfreq
 * support for multiple OMAP3 boards

The latest, tested PM branch is available as a branch named 'pm' from the linux-omap-pm repository. This branch is also sync'd daily as the 'pm' branch of the main linux-omap repository.



Important recent changes

 * rebased to latest omap/master (currently based on 2.6.36-rc3)
 * using completely re-written SmartReflex/Voltage layer from Thara
 * (some missing PMIC functionality compared to previous base)

Supported platforms (OMAP3 only)
Tested on the following platforms using omap3_pm_defconfig with busybox-based initramfs, and tested full-chip RET and OFF in idle and suspend:


 * 3430SDP
 * OMAP3EVM
 * Beagle
 * Overo (Water + Tobi)
 * Nokia N900 (a.k.a RX51)
 * Zoom2
 * KwikByte KBOC

What makes up the PM branch?
The PM branch is actually a collection of sub branches for the various features that are being worked on.

pm-core
The pm-core branch contains features/patches that are "ready", meaning they have been either already merged, awaiting merge, or basically done, but awaiting final approval.


 * pm-backports: backports from already merged or queued upstream development needed for PM (Kevin)
 * pm-fixes: PM relates fixes queued for the next -rc series; (Kevin: queued for Tony)
 * pm-next: misc. PM patches being queued for next merge window (Kevin: queued for Tony)
 * pm-hwmods: misc. fixes/enhancements queued for omap_hwmod layer (Kevin, Benoit: queued for Paul)
 * pm-opp: OMAP OPP Layer (Nishanth, Kevin: queued for Tony)

Work-in-Progress (WIP)

 * pm-cpufreq: base CPUfreq driver for OMAP (Kevin)
 * pm-sr: Smart Reflex and voltage layer: in process of rewrite (Thara)
 * pm-gpio</tt>: GPIO off-mode support: needs rework on top of GPIO hwmod conversion (Kevin, obsoleted)
 * pm-otg-reset</tt>: MUSB/OTG PM work: needs rework on top of MUSB hwmod conversion (Kevin, obsoleted)
 * pm-t2</tt>: T2/PMIC board support: needs rework after voltage layter (Kevin, obsoleted)
 * pm-debug</tt>: misc. PM debug patches (Kevin)
 * pm-defconfig</tt>: defconfig for omap3_pm_defconfig</tt> (Kevin)
 * pm-debobs</tt>: dropped: old support for debug observability (debobs) lines (obsolete)

Features
By default, the OMAP is configured to hit full-chip retention in suspend.

Suspend/Resume

 * 1) echo mem > /sys/power/state

Serial console activity or other configured wakeup sources (keypad, touchscreen) will trigger resume.

Optionally you can use a wake-up timer:
 * 1) echo 4 > /debug/pm_debug/wakeup_timer_seconds

Upon resume, you can use the powerdomain state statistics to check whether all states hit the desired state, cf. 'Debug info'


 * 1) cat /debug/pm_debug/count

In addition, if any powerdomains did not hit the desired state, you will see a message on the console.

Enabling system for hitting retention during idle
By default, the kernel will not try to hit retention or off while idle. To enable idle path to attempt deeper sleep states:


 * 1) echo 1 > /debug/pm_debug/sleep_while_idle

Then, wait for any inactivity timers to expire (such as the 5 second UART timer) and check the powerdomain transition statistics to see that transitions are happening

# cat /debug/pm_debug/count

Enabling system for hitting OFF
By default, retention is the deepest sleep state attempted. To enable powerdomain transitions to off mode


 * 1) echo 1 > /debug/pm_debug/enable_off_mode

Once again, after a suspend or after some idle time, use the powerdomain transition stats to check that transitions to off-mode are happening

# cat /debug/pm_debug/count

DVFS: Dynamic Voltage and Frequency Scaling
By default, no DVFS transitions will occur because the CPUfreq 'userspace' governor is the default governor. This means that any DVFS transitions must be manually triggered by a userspace application, or by using the CPUfreq sysfs interface( cf. 'CPUfreq kernel interface'). The OnDemand governor enables DVFS transitions based on CPU load.

Usage of the CPUfreq utils: Shows the current CPUfreq info: current governor, possible OPPs, current OPP ...
 * 1) cpufreq-info

To change the current governor to e.g. 'userspace' or 'ondemand': Note: the corresponding governor support must be compiled in the kernel or as a module.
 * 1) cpufreq-set -g userspace
 * 2) cpufreq-set -g ondemand

To change the frequency (with 'userspace' as the current governor): The frequency is in KHz, as shown by cpufreq-info
 * 1) cpufreq-set -f 550000

Known Problems
<=2.6.36 kernel # echo enabled > /sys/devices/platform/serial8250.3/tty/ttyS3/power/wakeup >=2.6.37-rc1 kernel (please verify and edit - the quad uart port still uses 8250 driver, while omap uses ttyO (omap-serial driver) ) # echo enabled > /sys/devices/platform/serial8250.0/tty/ttyS0/power/wakeup
 * Zoom2: serial console wakeups not working
 * Problem: on suspend, by default the serial driver will disable serial interrupts, thus disabling the GPIO IRQ needed for wakeup.
 * Fix: enable the wakeup feature for the tty used as console:
 * Root filesystem on MMC leads to crash when using off-mode.
 * There is currently no support for off-mode in the MMC driver.
 * GPIO module-level wakeups not always working
 * Background: GPIO wakeups can happen either via the GPIO module itself (module-level wakeups) or via IO pad wakeups if the CORE powerdomain is inactive, in retention or off.
 * If the IO pad wakeups are not enabled (either because CORE remains on, or because IO pad is not armed) GPIO wakeups may not happen unless the GPIO module-level wakeups are programmed correctly.
 * To ensure GPIO module wakeups are programmed correctly:
 * Enable GPIO IRQ for wakeup GPIO, including ISR. Use request_irq</tt>
 * Ensure GPIO is edge-triggered. Only edge triggered GPIOs are wakeup capable (c.f. omap34xx TRM Sec. 25.5.3.1)
 * the flags</tt> argument of request_irq</tt> should have either IRQF_TRIGGER_FALLING</tt>, IRQF_TRIGGER_RISING</tt> or both.
 * Enable GPIO IRQ as wakeup source using enable_irq_wake(gpio_to_irq)</tt>
 * NOTE: It is very important that an interrupt handler be configured for the GPIO IRQ, even if it does nothing but return IRQ_HANDLED</tt>. This is because without an interrupt handler, the GPIO IRQ event will never be properly cleared and this can prevent the GPIO module from hitting retention or off on the next idle request (c.f. omap34xx TRM Sec. 25.5.3.1).


 * GPIO wakeup works once, but prevents future retention
 * See NOTE just above

Debug info
First, mount the debug filesystem (debugfs)


 * 1) mount -t debugfs debugfs /debug

Show powerdomain state statistics and clockdomain active clocks


 * 1) cat /debug/pm_debug/count

Dump current PRCM registers


 * 1) cat /debug/pm_debug/registers/current

Dump PRCM register snapshot taken just before suspend (just before jump into SRAM idle code)


 * 1) cat /debug/pm_debug/registers/1

Dump PRCM register snapshot taken immediately after resume


 * 1) cat /debug/pm_debug/registers/2

UART wakeup and timeout options
By default, each of the on-chip OMAP UARTs are enabled as wakeup sources. In addition, they are configured with a configurable inactivity timer (default 5 seconds) after which the UART clocks are allowed to be gated during idle or suspend.

For example, to disable the wakeup capability of a UART1 (a.k.a ttyS0) <= 2.6.36 kernel (uses 8250 driver) # echo disabled > /sys/devices/platform/serial8250.0/power/wakeup >= 2.6.37-rc1 kernel (uses omap-serial driver ttyOx) # echo disabled > /sys/devices/platform/omap/omap-hsuart.0/power/wakeup And to change the inactivity timer to 10 seconds, instead of the default 5: <= 2.6.36 kernel (uses 8250 driver) # echo 10 > /sys/devices/platform/serial8250.0/sleep_timeout >= 2.6.37-rc1 kernel (uses omap-serial driver ttyOx) # echo 10 > /sys/devices/platform/omap/omap-hsuart.0/sleep_timeout

Note that you can cat</tt> these files under /sys</tt> as well to see the current values.

UART PM Debugging Techniques
Debugging problems with the OMAP UART driver wakeup and data transfer when Power Management is enabled can be quite tedious, if one does not have a proper HW setup. An example of a setup (including both HW and SW changes) can be found in the OMAP_UART_pm_debugging page.

OPPs control
'''NOTE: OPP control via sysfs has been removed. Please use CPUfreq interfaces for DVFS. '''

SmartReflex control
NOTE: detailed information on SmartReflex can be found HERE

Enables SmartReflex autocompensation on VDD1 (Note: This feature can only be tested on a ES3.1 silicon) Disables SmartReflex? autocompensation on VDD1
 * 1) echo 1 > /sys/power/sr_vdd1_autocomp
 * 1) echo 0 > /sys/power/sr_vdd1_autocomp

Enables SmartReflex autocompensation on VDD2 (Note: This feature can only be tested on a ES3.1 silicon) Disables SmartReflex? autocompensation on VDD2
 * 1) echo 1 > /sys/power/sr_vdd2_autocomp
 * 1) echo 0 > /sys/power/sr_vdd2_autocomp

CPUfreq kernel interface
Although the cpufreq utils are the preferred way to use the DVFS feature, the cpufreq kernel interface has some more information available. The main entry point is in /sys/devices/system/cpu/cpu0/cpufreq.

To list the available governors:
 * 1) cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors

To list the available frequencies:
 * 1) cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies

To show the current frequency:
 * 1) cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

To change the default governor:
 * 1) echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

The 'stats' directory has info about the cpufreq transitions and the time spent in the various OPPs:
 * 1) cat /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans
 * 2) cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state

PM code in Mainline
Here's a very crude outline of my plans for getting PM branch code to mainline. Unless otherwise stated, this has only been tested on OMAP3, although some care has been taken to not break OMAP2 in the process.

Currently in mainline (2.6.31)

 * clock framework and infrastructure
 * clockdomain and powerdomain core
 * full-chip retention in suspend
 * full-chip retention in idle

merged in 2.6.32

 * misc. PM driver updates
 * SPI
 * PM debug infrastructure
 * OMAP PM layer
 * omap_hwmod/omap_device
 * twl4030 power support

merged in 2.6.33

 * off-mode support
 * context save/restore support


 * CPUidle support
 * including off-mode C-states


 * Drivers
 * I2C driver off-mode support: re-init every transaction

Merged in 2.6.34

 * Large set of fixes from Nokia and others
 * core GPIO PM support
 * Misc. fixes

What's left in PM branch
The remaining parts in the PM branch are not ready for upstream in their current form. This is either due to code quality, basic design problems or not ready to scale for OMAP3430 and OMAP4.


 * GPIO off-mode support (waiting GPIO hwmod conversion to rework and push upstream)
 * OPP layer
 * SRF
 * DVFS, CPUfreq
 * SmartReflex driver core (deps: OMAP PM layer, OPP layer)

Dropped (or to be removed) from PM branch

 * debug observability (debobs): has been dropped from PM branch for 2.6.33/34. It needs mux updates and misc. cleanups before going upstream.  Since there are no further dependencies on PM branch, a new branch 'pm-debobs' has been created based on mainline where the author (or anyone interested) can do necessary cleanups.



Future directions
What's next is to get the remaining functionality of the PM branch into mainline. As pointed out above in the 'what's left in PM branch' secion, those parts are not yet ready for mainline. To that end, the goal of this section is to lay out a rough plan of how to get those features done in a way that can be submitted upstream.


 * 3630, OMAP4 support
 * new OPP layer
 * device PM control

Device PM control via omap_device + omap_hwmod
Currently, we have a rather ad-hoc way for device drivers to do power management. Currently this is done in drivers by directly using the clock framework API in combination with manually setting device specific PM registers (e.g. SYSCONFIG for various idle setting bits etc.)

The goal of new device PM control is to have a standard, common, portable interface for device drivers to control PM. From a driver API point of view, there is a new single API: the Run-time PM API. Internally to the OMAP PM core, the implementation of the runtime PM API will use the new omap_device and omap_hwmod layers to implement device PM.

omap_hwmod, omap_device conversion
An important buidling block to converting to a common framework (runtime PM) for device PM is a common framework for all on-chip hardware blocks. This is available as the omap_device and omap_hwmod layers. These layers provide an abstraction so that all hardware IP blocks can be controlled using the same API. The runtime PM layer is then implemented as a think layer on top of the omap_device API.

This implies that in order to have runtime PM support for a device, the underlying HW IP must be represented by an omap_hwmod and have a correspdonding omap_device built for it. Then, using the runtime PM API from the driver will result in omap_device API calls to control the IP.


 * Current Status: http://omappedia.org/wiki/HWMOD

Run-time PM
Run-time PM is a recent development in the upstream kernel community. It provides an architecture independent framework for doing runtime power management of IO devices. It also extends the platform_bus/platform_device infrastructure to allow arch-specific extentions of the platform_device.


 * LWN article: http://lwn.net/Articles/347573/
 * Kevin Hilman's talk from ELC 2010 in San Francisco: http://elinux.org/images/0/08/ELC-2010-Hilman-Runtime-PM.pdf


 * Key features
 * architecture independent
 * only a framework, does nothing without platform specific hooks


 * Plans for use in linux-omap
 * OMAP-specific extention of platform_device: contains an omap_device
 * implement platform specific runtime PM hooks for OMAP
 * runtime PM API used by all OMAP drivers


 * Current status
 * Propsed runtime PM implementation for OMAP available in <tt>pm-wip/runtime</tt> branch of Kevin's linux-omap-pm repository.

Public Power management test framework
Some commonly used power management utilities are listed here which make sense from an OMAP perspective

Cpufreq utils
cpufreq utils for testing dynamic voltage and frequency scaling.

Maemo pm_test
pm-test plugin for Maemo says utility which tests that kernel and kernel modules works power management wise This utility could be used to sanity test the powermanagement impact to a system for suspend/restore and basic power features.

Quick verification of suspend-idle functionality
the following script may be used with userspace supporting something simple as busybox: SYS=/sys DEBUG=$SYS/kernel/debug PROC=/proc PMDEBUG=$DEBUG/pm_debug VOLTAGE_OFF=$PMDEBUG/voltage_off_mode kver=`uname -r` if [ $kver > "2.6.36" ]; then UART="$SYS/devices/platform/omap/omap-hsuart" else UART="$SYS/devices/platform/serial8250" fi UART1=$UART.0/sleep_timeout UART2=$UART.1/sleep_timeout UART3=$UART.2/sleep_timeout cpu_idle{ echo -n "$1" > $PMDEBUG/sleep_while_idle } off_mode{ echo -n "$1" > $PMDEBUG/enable_off_mode } suspend_me{ echo -n "mem" > $SYS/power/state } core_count{ cat $PMDEBUG/count |grep "^core_pwrdm" } core_ret_count{ core_count|cut -d ',' -f3|cut -d ':' -f2 } core_off_count{ core_count|cut -d ',' -f2|cut -d ':' -f2 } wakeup_timer{ echo -n "$1" > $PMDEBUG/wakeup_timer_seconds echo -n "$2" > $PMDEBUG/wakeup_timer_milliseconds } setup_tty_sleep_timeout { if [ -f $UART1 ]; then echo -n "$1" > $UART1 fi if [ -f $UART2 ]; then echo -n "$1" > $UART1 fi if [ -f $UART3 ]; then echo -n "$1" > $UART3 fi } measure_start{ OFF_START=`core_off_count` RET_START=`core_ret_count` TIME_START=`date "+%s"` } measure_end{ OFF_END=`core_off_count` RET_END=`core_ret_count` TIME_END=`date "+%s"` } measure_print{ DUR=`expr $TIME_END - $TIME_START` echo "$1 | $2 | OFF: $OFF_START->$OFF_END| RET:$RET_START->$RET_END ($DUR sec)" } check_core_off{ RESULT=FAIL if [ $OFF_START -lt $OFF_END ]; then RESULT=PASS fi } check_core_ret{ RESULT=FAIL if [ $RET_START -lt $RET_END ]; then RESULT=PASS fi } disable_all{ # disable voltage off if [ -f $VOLTAGE_OFF ]; then echo -n "0" >$VOLTAGE_OFF fi setup_tty_sleep_timeout 0 wakeup_timer 0 0 off_mode 0 cpu_idle 0 } test_idle_ret { disable_all measure_start setup_tty_sleep_timeout 5 cpu_idle 1 sleep 20 disable_all sleep 1;sync measure_end check_core_ret measure_print "IDLE:RET test" $RESULT } test_idle_off { disable_all measure_start setup_tty_sleep_timeout 5 off_mode 1 cpu_idle 1 sleep 20 disable_all sleep 1;sync measure_end check_core_off measure_print "IDLE:OFF test" $RESULT } test_suspend_ret { disable_all measure_start wakeup_timer 5 0 suspend_me disable_all sleep 1;sync measure_end check_core_ret measure_print "SUSPEND:RET test" $RESULT } test_suspend_off { disable_all measure_start off_mode 1 wakeup_timer 5 0 suspend_me disable_all sleep 1;sync measure_end check_core_off measure_print "SUSPEND:OFF test" $RESULT } already_mntd=`mount|grep $PROC` if [ x == x"$already_mntd" ]; then mount -t proc none $PROC fi already_mntd=`mount|grep $SYS` if [ x == x"$already_mntd" ]; then mount -t sysfs none $SYS fi already_mntd=`mount|grep $DEBUG` if [ x == x"$already_mntd" ]; then mount -t debugfs none $DEBUG fi NR="" R=`test_suspend_off` echo $R NR="$NR\n$R" R=`test_suspend_ret` echo $R NR="$NR\n$R" R=`test_idle_off` echo $R NR="$NR\n$R" R=`test_idle_ret` echo $R NR="$NR\n$R" cat $PMDEBUG/count echo -e "$NR"
 * 1) !/bin/ash
 * 2) Quick script to verify SUSPEND Resume behavior without human intervention
 * 3) Refer: http://elinux.org/OMAP_Power_Management for details
 * 1) Some params that might change based on the environment
 * 1) Setup cpu idle
 * 1) setup off mode
 * 1) Do a suspend
 * 1) get my core data (This is the last domain to hit lowest power state)
 * 1) get my retention counter
 * 1) get my off counter
 * 1) setup wakeup timer - automated testing
 * 1) Setup our uart to be inactivity timer
 * 1) Measurement Start
 * 1) Measurement End
 * 1) Common formatted print
 * 1) verify function
 * 1) Disable everything
 * 1) test idle - core ret
 * 1) test idle - core off
 * 1) test suspend - core ret
 * 1) test suspend - core off
 * 1) mount up the basic fs
 * 1) Lets run the tests one by one..
 * 1) Print End result summary
 * 1) Print test summary