Power management provides the ability to control and manage the electrical power usage of a computer system or device. Power management enables systems to conserve energy by using less power when idle and by shutting down completely when not in use. For example, desktop computer systems can use a significant amount of power and often are left idle, particularly at night. Power management software can detect that the system is not being used. Accordingly, power management can power down the system or some of its components.
This chapter provides information on the following subjects:
12.1. Power Management Framework
The illumos Power Management framework depends on device drivers to implement device-specific power management functions. The framework is implemented in two parts:
Device power management – Automatically turns off unused devices to reduce power consumption
System power management – Automatically turns off the computer when the entire system is idle
12.1.1. Device Power Management
The framework enables devices to reduce their energy consumption after a specified idle time interval. As part of power management, system software checks for idle devices. The Power Management framework exports interfaces that enable communication between the system software and the device driver.
The illumos Power Management framework provides the following features for device power management:
A device-independent model for power-manageable devices.
A set of DDI interfaces for notifying the framework about power management compatibility and idleness state.
12.1.2. System Power Management
System power management involves saving the state of the system prior to powering the system down. Thus, the system can be returned to the same state immediately when the system is turned back on.
To shut down an entire system with return to the state prior to the shutdown, take the following steps:
Stop kernel threads and user processes. Restart these threads and processes later.
Save the hardware state of all devices on the system to disk. Restore the state later.
System power management is currently implemented only on some SPARC systems supported by illumos. See the power.conf(4) man page for more information.
The System Power Management framework in illumos provides the following features for system power management:
A platform-independent model of system idleness.
A set of interfaces for the device driver to override the method for determining which drivers have hardware state.
A set of interfaces to enable the framework to call into the driver to save and restore the device state.
A mechanism for notifying processes that a resume operation has occurred.
12.2. Device Power Management Model
The following sections describe the details of the device power management model. This model includes the following elements:
Device power management interfaces
Power management entry points
12.2.1. Power Management Components
A device is power manageable if the power consumption of the device can be reduced when the device is idle. Conceptually, a power-manageable device consists of a number of power-manageable hardware units that are called components.
The device driver notifies the system about device components and their associated power levels. Accordingly, the driver creates a pm-components(9P) property in the driver's attach(9E) entry point as part of driver initialization.
Most devices that are power manageable implement only a single component. An example of a single-component, power-manageable device is a disk whose spindle motor can be stopped to save power when the disk is idle.
If a device has multiple power-manageable units that are separately controllable, the device should implement multiple components.
An example of a two-component, power-manageable device is a frame buffer card with a monitor. Frame buffer electronics is the first component [component 0]. The frame buffer's power consumption can be reduced when not in use. The monitor is the second component [component 1]. The monitor can also enter a lower power mode when the monitor is not in use. The frame buffer electronics and monitor are considered by the system as one device with two components.
Multiple Power Management Components
To the power management framework, all components are considered equal
and completely independent of each other. If the component states are not
completely compatible, the device driver must ensure that undesirable state
combinations do not occur. For example, a frame buffer/monitor card has
the following possible states:
D3. The monitor attached to the card
has the following potential states:
Off. These states are not necessarily
compatible with each other. For example, if the monitor is
then the frame buffer must be at
D0, that is, full on.
If the frame buffer driver gets a request to power up the monitor to
the frame buffer is at
D3, the driver must call pm_raise_power(9F) to bring
the frame buffer up before setting the monitor
requests to lower the power of the frame buffer while the monitor is
be refused by the driver.
12.2.2. Power Management States
Each component of a device can be in one of two states: busy or idle. The device driver notifies the framework of changes in the device state by calling pm_busy_component(9F) and pm_idle_component(9F). When components are initially created, the components are considered idle.
12.2.3. Power Levels
pm-components property exported by the device,
the Device Power Management framework knows what power levels the device supports.
Power-level values must be positive integers. The interpretation of power
levels is determined by the device driver writer. Power levels must be listed
in monotonically increasing order in the
A power level of 0 is interpreted by the framework to mean off. When the
framework must power up a device due to a dependency, the framework sets each
component at its highest power level.
The following example shows a
.conf file of a driver that implements a single
power-managed component consisting of a disk spindle motor. The disk spindle
motor is component 0. The spindle motor supports two power levels. These levels
represent “stopped” and “spinning at full speed.”
The following example shows how Sample pm-component Entry could be implemented in the
attach routine of
The following example shows a frame buffer that implements two components. Component 0 is the frame buffer electronics that support four different power levels. Component 1 represents the state of power management of the attached monitor.
When a device driver is first attached, the framework does not know the power level of the device. A power transition can occur when:
The framework has lowered the power level of a component because a time threshold has been exceeded.
Another device has changed power and a dependency exists between the two devices. See Power Management Dependencies.
After a power transition, the framework begins tracking the power level of each component of the device. Tracking also occurs if the driver has informed the framework of the power level. The driver informs the framework of a power level change by calling pm_power_has_changed(9F).
The system calculates a default threshold for each potential power transition.
These thresholds are based on the system idleness threshold. The default thresholds
can be overridden using
pmconfig or power.conf(4). Another default threshold
based on the system idleness threshold is used when the component power level
12.2.4. Power Management Dependencies
Some devices should be powered down only when other devices are also powered down. For example, if a CD-ROM drive is allowed to power down, necessary functions, such as the ability to eject a CD, might be lost.
To prevent a device from powering down independently, you can make that device dependent on another device that is likely to remain powered on. Typically, a device is made dependent upon a frame buffer, because a monitor is generally on whenever a user is utilizing a system.
The power.conf(4)file specifies the dependencies among devices. (A parent node in the device tree implicitly depends upon its children. This dependency is handled automatically by the power management framework.) You can specify a particular dependency with a power.conf(4) entry of this form:
device-dependency dependent-phys-path phys-path
Where dependent-phys-path is the device that is kept powered up, such as the CD-ROM drive. phys-path represents the device whose power state is to be depended on, such as the frame buffer.
Adding an entry to power.conf for every new device that is plugged into the system would be burdensome. The following syntax enables you to indicate dependency in a more general fashion:
device-dependency-property property phys-path
Such an entry mandates that any device that exports the property property must be dependent upon the device named by phys-path. Because this dependency applies especially to removable-media devices, /etc/power.conf includes the following line by default:
device_dependent-property removable-media /dev/fb
With this syntax, no device that exports the
property can be powered down unless the console frame buffer is also powered
For more information, see the power.conf(4) and removable-media(9P) man pages.
12.2.5. Automatic Power Management for Devices
If automatic power management is enabled by
pmconfig or power.conf(4), then all
devices with a pm-components(9P) property automatically will use power management.
After a component has been idle for a default period, the component is automatically
lowered to the next lowest power level. The default period is calculated by
the power management framework to set the entire device to its lowest power
state within the system idleness threshold.
By default, automatic power management is enabled on all SPARC desktop systems first shipped after July 1, 1999. This feature is disabled by default for all other systems. To determine whether automatic power management is enabled on your machine, refer to the power.conf(4) man page for instructions.
power.conf(4) can be used to override the defaults calculated by the framework.
12.2.6. Device Power Management Interfaces
A device driver that supports a device with power-manageable components
must create a pm-components(9P) property. This property indicates to the system that
the device has power-manageable components.
tells the system which power levels are available. The driver typically informs
the system by calling ddi_prop_update_string_array(9F) from the
driver's attach(9E) entry
point. An alternative means of informing the system is from a driver.conf(4) file. See
the pm-components(9P) man page for details.
Busy-Idle State Transitions
The driver must keep the framework informed of device state transitions from idle to busy or busy to idle. Where these transitions happen is entirely device-specific. The transitions between the busy and idle states depend on the nature of the device and the abstraction represented by the specific component. For example, SCSI disk target drivers typically export a single component, which represents whether the SCSI target disk drive is spun up or not. The component is marked busy whenever an outstanding request to the drive exists. The component is marked idle when the last queued request finishes. Some components are created and never marked busy. For example, components created by pm-components(9P) are created in an idle state.
The pm_busy_component(9F) and pm_idle_component(9F) interfaces notify the power management framework of busy-idle state transitions. The pm_busy_component(9F) call has the following syntax:
int pm_busy_component(dev_info_t *dip, int component);
pm_busy_component(9F) marks component as busy. While the component is busy, that component should not be powered off. If the component is already powered off, then marking that component busy does not change the power level. The driver needs to call pm_raise_power(9F) for this purpose. Calls to pm_busy_component(9F) are cumulative and require a corresponding number of calls to pm_idle_component(9F) to idle the component.
The pm_idle_component(9F) routine has the following syntax:
int pm_idle_component(dev_info_t *dip, int component);
component as idle. An idle component is subject to being powered off.
pm_idle_component(9F) must be called once for each call to
order to idle the component.
Device Power State Transitions
A device driver can call pm_raise_power(9F) to request that a component be set to at least a given power level. Setting the power level in this manner is necessary before using a component that has been powered off. For example, the read(9E) routine of a SCSI disk target driver might need to spin up the disk, if the disk has been powered off. The pm_raise_power(9F) function requests the power management framework to initiate a device power state transition to a higher power level. Normally, reductions in component power levels are initiated by the framework. However, a device driver should call pm_lower_power(9F) when detaching, in order to reduce the power consumption of unused devices as much as possible.
Powering down can pose risks for some devices. For example, some tape drives damage tapes when power is removed. Similarly, some disk drives have a limited tolerance for power cycles, because each cycle results in a head landing. Use the no-involuntary-power-cycles(9P) property to notify the system that the device driver should control all power cycles for the device. This approach prevents power from being removed from a device while the device driver is detached unless the device was powered off by a driver's call to pm_lower_power(9F) from its detach(9E) entry point.
The pm_raise_power(9F) function is called when the driver discovers that a component needed for some operation is at an insufficient power level. This interface causes the driver to raise the current power level of the component to the needed level. All the devices that depend on this device are also brought back to full power by this call.
Call the pm_lower_power(9F) function when the device is detaching once access
to the device is no longer needed. Call pm_lower_power(9F) to set each component
at the lowest power so that the device uses as little power as possible while
not in use. The
pm_lower_power function must be called
detach entry point. The
has no effect if it is called from any other part of the driver.
The pm_power_has_changed(9F) function is called to notify the framework about a power transition. The transition might be due to the device changing its own power level. The transition might also be due to an operation such as suspend-resume. The syntax for pm_power_has_changed(9F) is the same as the syntax for pm_raise_power(9F).
12.2.7. power Entry Point
The power management framework uses the power(9E) entry point.
power uses the following syntax:
int power(dev_info_t *dip, int component, int level);
When a component's power level needs to be changed, the system calls the power(9E) entry point. The action taken by this entry point is device driver-specific. In the example of the SCSI target disk driver mentioned previously, setting the power level to 0 results in sending a SCSI command to spin down the disk, while setting the power level to the full power level results in sending a SCSI command to spin up the disk.
If a power transition can cause the device to lose state, the driver
must save any necessary state in memory for later restoration. If a power
transition requires the saved state to be restored before the device can be
used again, then the driver must restore that state. The framework makes no
assumptions about what power transactions cause the loss of state or require
the restoration of state for automatically power-managed devices. The following
example shows a sample
The following example is a
power routine for a
device with two components, where component 0 must be on when component 1
12.3. System Power Management Model
This section describes the details of the System Power Management model. The model includes the following components:
Power management entry points
12.3.1. Autoshutdown Threshold
The system can be shut down, that is, powered off, automatically after a configurable period of idleness. This period is known as the autoshutdown threshold. This behavior is enabled by default for SPARC desktop systems first shipped after October 1, 1995 and before July 1, 1999. See the power.conf(4)man page for more information. Autoshutdown can be overridden using dtpower(1M) or power.conf(4).
12.3.2. Busy State
The busy state of the system can be measured in several ways. The currently
supported built-in metric items are keyboard characters, mouse activity,
tty characters, load average, disk reads, and NFS requests. Any
one of these items can make the system busy. In addition to the built-in metrics,
an interface is defined for running a user-specified process that can indicate
that the system is busy.
12.3.3. Hardware State
that export a
reg property are considered to have hardware
state that must be saved prior to shutting down the system. A device without
reg property is considered to be stateless. However,
this consideration can be overridden by the device driver.
A device with hardware state but
reg property, such as a SCSI driver, must be called
to save and restore the state if the driver exports a
with the value
needs-suspend-resume. Otherwise, the lack
reg property is taken to mean that the device has
no hardware state. For information on device properties, see Properties.
A device with a
reg property and no hardware state
can export a
pm-hardware-state property with the value
pm-hardware-state property keeps the framework from
calling the driver to save and restore that state. For more information on
power management properties, see the pm-components(9P) man page.
12.3.4. Automatic Power Management for Systems
The system is shut down if the following conditions apply:
The system has been idle for autoshutdown threshold minutes.
All of the metrics that are specified in power.conf have been satisfied.
12.3.5. Entry Points Used by System Power Management
power management passes the command
DDI_SUSPEND to the
detach(9E) driver entry
point to request the driver to save the device hardware state. System power
management passes the command
DDI_RESUME to the attach(9E) driver entry point to request
the driver to restore the device hardware state.
detach Entry Point
The syntax for detach(9E) is as follows:
int detach(dev_info_t *dip, ddi_detach_cmd_t cmd);
A device with a
pm-hardware-state property set to
needs-suspend-resume must be able to save the hardware state of the device. The framework
calls into the driver's detach(9E) entry
point to enable the driver to save the state for restoration after the system
power returns. To process the
DDI_SUSPEND command, detach(9E) must perform the following
Block further operations from being initiated until the device is resumed, except for dump(9E) requests.
Wait until outstanding operations have completed. If an outstanding operation can be restarted, you can abort that operation.
Cancel any timeouts and callbacks that are pending.
Save any volatile hardware state to memory. The state includes the contents of device registers, and can also include downloaded firmware.
If the driver is unable to suspend the device and save its state to
memory, then the driver must return
DDI_FAILURE. The framework
then aborts the system power management operation.
In some cases, powering down a device involves certain risks. For example, if a tape drive is powered off with a tape inside, the tape can be damaged. In such a case, attach(9E) should do the following:
Call ddi_removing_power(9F) to determine whether a
DDI_SUSPENDcommand can cause power to be removed from the device.
Determine whether power removal can cause problems.
If both cases are true, the
DDI_SUSPEND request should be rejected. detach(9E) Routine Implementing DDI_SUSPEND shows
an attach(9E) routine
using ddi_removing_power(9F) to check whether the
Dump requests must be honored. The framework uses the dump(9E) entry point to write out the state file that contains the contents of memory. See the dump(9E) man page for the restrictions that are imposed on the device driver when using this entry point.
Calling the detach(9E) entry
point of a power-manageable component with the
should save the state when the device is powered off. The driver should cancel
pending timeouts. The driver should also suppress any calls to
except for dump(9E) requests. When the device
is resumed by a call to attach(9E) with
a command of
DDI_RESUME, timeouts and calls to
pm_raise_power can be resumed. The driver must keep sufficient track of its state
to be able to deal appropriately with this possibility. The following example
shows a detach(9E) routine
DDI_SUSPEND command implemented.
attach Entry Point
The syntax for attach(9E) is as follows:
int attach(dev_info_t *dip, ddi_attach_cmd_t cmd);
When power is restored to the system,
each device with a
reg property or with a
pm-hardware-state property of value
needs-suspend-resume has its
attach(9E) entry point
called with a command value of
DDI_RESUME. If the system
shutdown is aborted, each suspended driver is called to resume even though
the power has not been shut off. Consequently, the resume code in attach(9E) must make no assumptions
about whether the system actually lost power.
The power management framework considers the power level of the components
to be unknown at
DDI_RESUME time. Depending on the nature
of the device, the driver writer has two choices:
If the driver can determine the actual power level of the components of the device without powering the components up, such as by reading a register, then the driver should notify the framework of the power level of each component by calling pm_power_has_changed(9F).
If the driver cannot determine the power levels of the components, then the driver should mark each component internally as unknown and call pm_raise_power(9F) before the first access to each component.
The following example shows an attach(9E) routine with the
The detach(9E) and attach(9E) interfaces can also be used to resume a system that has been quiesced.
12.4. Power Management Device Access Example
If power management is supported, and detach(9E) and attach(9E) are used as in detach(9E) Routine Implementing DDI_SUSPEND and attach(9E) Routine Implementing DDI_RESUME, then access to the device can be made from user context, for example, from read(2), write(2), and ioctl(2).
The following example demonstrates this approach. The example assumes
that the operation about to be performed requires a component
is operating at power level
The code fragment in the following example can be used when device operation completes, for example, in the device's interrupt handler.
12.5. Power Management Flow of Control
Power Management Conceptual State Diagram illustrates the flow of control in the power management framework.
When a component's activity is complete, a driver can call pm_idle_component(9F) to mark the component as idle. When the component has been idle for its threshold time, the framework can lower the power of the component to its next lower level. The framework calls the power(9E) function to set the component's power to the next lower supported power level, if a lower level exists. The driver's power(9E) function should reject any attempt to lower the power level of a component when that component is busy. The power(9E) function should save any state that could be lost in a transition to a lower level prior to making that transition.
When the component is needed at a higher level, the driver calls pm_busy_component(9F). This call keeps the framework from lowering the power still further and then calls pm_raise_power(9F) on the component. The framework next calls power(9E) to raise the power of the component before the call to pm_raise_power(9F) returns. The driver's power(9E) code must restore any state that was lost in the lower level but that is needed in the higher level.
When a driver is detaching, the driver should call pm_lower_power(9F) for each component to lower its power to its lowest level. The framework can then call the driver's power(9E) routine to lower the power of the component before the call to pm_lower_power(9F) returns.
12.6. Changes to Power Management Interfaces
Prior to the Solaris 8 release, power management of devices was not automatic. Developers had to add an entry to /etc/power.conf for each device that was to be power-managed. The framework assumed that all devices supported only two power levels: 0 and standard power.
Power assumed an implied dependency of all other components on component
0. When component 0 changed to level 0, a call was made into the driver's
detach(9E) with the
DDI_PM_SUSPEND command to
save the hardware state. When component 0 changed from level 0, a call was
made to the
attach(9E) routine with the command
to restore hardware state.
The following interfaces and commands are obsolete, although they are still supported for binary purposes:
Since the Solaris 8 release, devices that export the
pm-components property automatically use power management if
autopm is enabled.
The framework now knows from the
pm-components property which power levels are supported by each device.
The framework makes no assumptions about dependencies among the different components of a device. The device driver is responsible for saving and restoring hardware state as needed when changing power levels.
These changes enable the power management framework to deal with emerging device technology. Power management now results in greater power savings. The framework can detect automatically which devices can save power. The framework can use intermediate power states of the devices. A system can now meet energy consumption goals without powering down the entire system and without any functions.