Pm Sub System

From eLinux.org
Jump to: navigation, search

Pm Sub System

This document describes OLD and NEW PM model in linux kernel 2.6.8.


OLD PM model

This interface set cannot handle appropriately power dependencies constrained by geometrical connections. For example, PCI devices have to be turned off before the PCI bus turned off. So new interfaces which work with LDM have been introduced in 2.6.x to replace the following OLD PM interfaces (now) in "kernel/power/pm.c". (These routines were in the file kernel/pm.c in 2.4.x)


pm_register();
pm_unregister();
pm_unregister_all();
pm_send();
pm_send_all();
pm_find();


Also, device drivers were requested to make calls to {pm_access() when they accessed their devices, and pm_dev_idle() when a device was not being used, so the system could keep track of device and system idleness.


pm_access();
pm_dev_idle();


Register handlers

Device drivers may register a callback handler for each device driver, using pm_register(). The callback handler is required to handle both the suspend and resume operations. The handler is placed on the double linked list "pm_devs" with this function.

Call handlers

The PM subsystem sends PM messages to the registered handlers by calling the routines pm_send() and pm_send_all(). For example, the APM routines under "arch/i386/kernel/apm.c" demonstrates the use of pm_send_all().

The routine pm_send_all() calls all the handlers registered with pm_register(), in the order they were registered. Please refer "kernel/power/pm.c" for details.

Note that in 2.4, apm.c::suspend() only called pm_send_all(), but in 2.6 it now calls pm_send_all(), followed by calls to device_suspend(3) and device_power_down(3).

TRB

API of handler

int suspend_resume_callback_handler(struct pm_dev *, pm_request_t, void *)


  1st arg: target device
  2nd arg: request type
    PM_SUSPEND
    PM_RESUME
  3rd arg: place holder for extra data

  return values:
    0 ... success
    otherwise ... fail


NEW PM model

The PM model has been modified in 2.6 to work with LDM (Linux Device Model) Most devices are connected to the system through a bus, and power management operations on such devices need to work in conjunction with code which manages the related.

Register handlers

LDM records a separate suspend and resume handler routine for each bus type as parts of its own structure, of type "struct bus_type", using bus_register(),

E.g drivers/pci/pci-driver.c

    struct bus_type pci_bus_type = {
            .name           = "pci",
            .match          = pci_bus_match,
            .hotplug        = pci_hotplug,
            .suspend        = pci_device_suspend,
            .resume         = pci_device_resume,
    };

    static int __init pci_driver_init(void)
    {
            return bus_register(&pci_bus_type);
    }


Each bus driver has a device register function, like platform_device_register() in drivers/base/platform.c (The platform_device_register() registers devices under "Platform." "Platform" abstracts unstructured device connections and legacy device connections.)

Each device driver can register their own suspend and resume handlers through the bus specific device register function, describe as above. Each bus device driver could have bus-specific device-suspend, device-resume handler and handler-register-function. That means the interfaces for those function could be different for each bus type.

E.g. For PCI, drivers/pci/pci-driver.c:

    static int pci_device_suspend(struct device * dev, u32 state)
    {
            struct pci_dev * pci_dev = to_pci_dev(dev);
            struct pci_driver * drv = pci_dev->driver;
    
            if (drv && drv->suspend)
                    return drv->suspend(pci_dev,state);
            return 0;
    }


For "Platform", drivers/base/platform.c:

    static int platform_suspend(struct device * dev, u32 state)
    {
            int ret = 0;
    
            if (dev->driver && dev->driver->suspend) {
                    ret = dev->driver->suspend(dev, state,
                    SUSPEND_DISABLE);
                    if (ret == 0)
                            ret = dev->driver->suspend(dev, state, SUSPEND_SAVE_STATE);
                    if (ret == 0)
                            ret = dev->driver->suspend(dev, state, SUSPEND_POWER_DOWN);
            }
            return ret;
    }


Call handlers

The PM code in the 2.6 kernel is based on Software Suspend/Resume Open Source project, mentioned as "swsusp". The design of "swsusp" is deeply influenced by ACPI. Most code assumes the power state value "0" means power-on corresponding D0 state of ACPI and "3" means power-off corresponding to D3 state of ACPI respectively.

There are two ways of controlling device power. One is a Device Power Management, called as Device PM or DPM in short. Another is System Level Power Management. We refer to this as System PM.


Internal Sequence of Device PM

With Device PM, user can now turn on and off individual devices through the /sys interface.

For example,

% echo 3 > /sys/bus/usb/devices/1-1/power/state

means turning off the specified device.


% echo  n > /sys/bus/usb/devices/1-1/power/state
      • when n == 0
    state_store()            [drivers/base/power/sysfs.c]
         ->
        dpm_runtime_suspend()    [drivers/base/power/runtime.c]
             ->
            suspend_device()    // suspend specified device
                    [drivers/base/power/suspend.c]
                 ->
                call suspend handler of the bus which the specified
                    device belongs to .                
             ->
                    call suspend handler for the specified device
    • when n = 0
    state_store()            [drivers/base/power/sysfs.c]
     -> dpm_runtime_resume()    [drivers/base/power/runtime.c]
         -> runtime_resume()    [driver/base/power/runtime.c]
             -> resume_device()    // resume specified device
                    [driver/base/power/resume.c]
                     -> call resume handler of the bus which the specified
                       device belongs to .                
            -> call resume handler for the specified device


Internal Sequence of System PM

With System Level PM, user can turn off and on the entire system. This interface is implemented as /sys/power/state and pm_suspend(). There is no mechanism for a System level resume, because there is no way for a suspended system to initiate a resume from software.


% echo "standby" > /sys/power/state 


    state_store()            [kernel/power/main.c]
    -> enter_state()        [kernel/power/main.c]
           :
     go with the same route when pm_suspend() called


pm_suspend()                [kernel/power/main.c]


    enter_state()            [kernel/power/main.c]
    ->suspend_prepare()        [kernel/power/main.c]
      ->
        pm_prepare_console()    [kernel/power/console.c]
        freeze_processes()        [kernel/power/process.c]
        pm_ops->prepare()
        device_suspend()        // called with "irq on"    
                    [driver/base/power/suspend.c] 
        ->
          suspend_device(), foreach device on the "dpm_active" list 
                    [driver/base/power/suspend.c] 
          move the entry from "dpm_active" to "dpm_off"  
          or "dpm_off_irq" on suspend handler returns -EAGAIN
          
      suspend_enter()        [kernel/power/main.c]
      ->
        local_irq_save()
        device_power_down()     // called with "irq off"
                    [drivers/base/power/suspend.c] 
        ->
          suspend_device(), foreach device on the "dpm_off_irq" list
                    [drivers/base/power/suspend.c] 
          sysdev_suspend()        [drivers/base/sys.c]
        pm_ops->enter()        // System will down here
        device_power_up()          // called with "irq off"
                    [drivers/base/power/resume.c] 
        ->
          sysdev_resume()        [drivers/base/sys.c]
          dpm_power_up()        [drivers/base/power/resume.c] 
          ->
        resume_device(), foreach device on the "dpm_off_irq" list
                    [driver/base/power/suspend.c] 
        move the entry from "dpm_irq_off" to "dpm_active"  
        local_irq_restore()
      suspend_finish()        [kernel/power/main.c]
      ->
        device_resume()        // called with "irq on" 
                    [drivers/base/power/resume.c] 
        ->
          dpm_resume()        [drivers/base/power/resume.c] 
          ->
        resume_device(), foreach device on the "dpm_off" list.
        move the entry to "dpm_active" from "dpm_off"
        pm_ops->finish()        // target PM specific
        thaw_processes()        [kernel/power/process.c]
        pm_restore_console()    [kernel/power/console.c]



suspend_device()        [drivers/base/power/suspend.c] 
    ->
      call suspend handler the bus which the specified
          device belongs to .
      ->
        call suspend handler for the specified device



sysdev_suspend()          [drivers/base/sys.c]
      for each kernel susbsys and class
        call suspend handler for each driver on "global_drivers", 
          registered with sysdev_register().
        call a suspend handler for each driver under the class.
        call a suspend handler for the class


sysdev_resume()        [drivers/base/sys.c]
      for each kernel susbsys and class
        call a resume handler for the class
        call a resume handler for each driver under 
        the class.
        call resume handler for each driver on "global_drivers", 
        registered with sysdev_register().


resume_device()        [drivers/base/power/resume.c]
    ->
      call resume handler of the bus which the specified 
          device belongs to.
      ->
        call resume handler for the specified device


API of handlers

Following two methods are prepared in struct "bus_types" in include/linx/pm.h to hold bus specific suspend/resume handlers for adapting device suspend/resume methods to bus specific ways.

First, the PM subsystem calls looks up the bus, which the specified device belongs to. Then it calls bus specific suspend/resume handler to suspend/resume the specified device. In most case, bus specific suspend/resume handler calls the device specific suspend/resume handler.



struct bus_types {
                            :
                            :
      int        (*suspend)(struct device * dev, u32 state);
      int        (*resume)(struct device * dev);
};

      (*suspend)()
        1st arg: target device
        2nd arg: PM state to be entered
            PM_SUSPEND_STANDBY, 
            PM_SUSPEND_MEM,
            PM_SUSPEND_DISK

        return values:
        0 ... success
        -EAGAIN ... try again later with "irq_off"
        otherwise ... fail

    (*resume)()
        1st arg: target device
        return values:
        0 ... success
        otherwise ... fail


PM_OPS

Target PM subsystem stands for the instance of PM subsystem presented on target system, adopting underlaying hardware and software requirements, like ACPI or APM.

PM_OPS provides target PM subsystem specific mthods to prepare, enter and finish supsend.

For example, the ACPI PM subsystem uses the following method, in drivers/acpi/sleep/main.c

static struct pm_ops acpi_pm_ops = {
        .prepare        = acpi_pm_prepare,
        .enter          = acpi_pm_enter,
        .finish         = acpi_pm_finish,
};


- prepare()
 prepare() perform a target PM susbsystem specific prepareation before suspend.
 prepare() usually checks the state to be entered. prepare() has to return 
 non-zero value, if operation was failed.
- enter()
 enter() provide a target PM susbsystem specific method to suspend. 
 For example, acpi_pm_enter() calls ACPI bios serivce.
 Platform will stop/sleep in enter().  enter() has to return non-zero value,
 if operation was failed.
- finish()
 finish() perform a PM susbsystem specific post-processing after platform 
 comes back or prepare() failed. finish() has to return non-zero value,
 if operation was failed.

Resources

- I found the following paper while I was researching this information, however
  it is a little old and seems to be out of date.
  - Linux Kernel Power Management - Patrick Mochel, Open Source Development Labs

Questions

- What are the function in pm_ops for?
  • PM_OPS provides target PM subsystem specific mthods to prepare, enter and finish supsend.
- What is relationship between LDM and kobjects?
  • The "Kobject" is an abstruction of hierarchical structured instance in the kernel. Each object has own name and refrence count, may have children and a parent, can be put into hierarchical tree and removed from it. "Kset" represents a set of "Kobject". Sys Fs is designed to provide methods to acccess "Kobject".In other word, from user space, program can communicate each object thorugh Sys Fs name sapce. LDM utilizes ""Kobject"" to maintain appearance of devices.