Alexandar Motin writes:
I would like to summarize some of my knowledge on reducing FreeBSD power consumption and describe some new things implemented in FreeBSD 8.x/9.x. The main character of this story is my 12" Acer TravelMate 6292 laptop with C2D T7700 2.4GHz CPU, 965GM chipset and SATA HDD.
Modern systems, especially laptops, are implementing big number of power-saving technologies. Some of them are working automatically, other have significant requirements and need special system tuning or trade-offs to be effectively used.
So here is the steps:
CPU is the most consuming part of the system. Under the full load it alone may consume more then 40W of power, but for real laptop usage the most important is idle consumption. Core2Duo T7700 CPU has 2 cores, runs on 2.4GHz frequency, supports EIST technology with P-states at 2400, 2000, 1600, 1200 and 800MHz levels, supports C1, C2 and C3 idle C-states, plus throttling. So how can we use it:
P-states and throttling
Enabling powerd allows to effectively control CPU frequency/voltage depending on CPU load. powerd on recent system can handle it quite transparently. By default, frequency controlled via mix of EIST and throttling technologies. First one controls both core frequency and voltage, second - only core frequency. Both technologies give positive power-saving effect. But effect of throttling is small and can be completely hidden by using C2 state, that's why I recommend to disable throttling control by adding to /boot/loader.conf:
After this sysctl reports only EIST frequencies:
dev.cpu.0.freq_levels: 2400/35000 2000/28000 1600/22000 1200/16000 800/14000
ACPI may report extra performance level with frequency 1MHz above the nominal to control Intel Turbo Boost operation. For example, on Core i7-870 you may see:
dev.cpu.0.freq_levels: 2934/106000 2933/95000 2800/82000 ...
, where value 2933 means 2.93GHz, but 2934, depending on situation, means 3.2-3.6GHz.
In my case frequency/voltage control saves about 5W of idle power.
C1 stops clock on some parts of CPU core during inactivity. It is safe, cheap and supported by CPUs for ages. System uses C1 state by default.
C2 state allows CPU to turn off all core clocks on idle. It is also cheap, but requires correct ACPI-chipset-CPU interoperation to be used. Use of C2 state can be enabled by adding to /etc/rc.conf:
Effect from this state is not so big when powerd is used, but still noticeable,
C3 state allows CPU completely stop all internal clocks, reduce voltage and disconnect from system bus. This state gives additional power saving effect, but it is not cheap and require trade-offs. As soon as CPU is completely stopped in C3 state, local APIC timers in each CPU core, used by FreeBSD as event sources on SMP, are not functioning. It stops system time, breaks scheduling that makes system close to dead. The only solution for this problem is to use some external timers.
There is also pseudo-state known as C1E. It is a workaround in modern CPUs to work better with OS without C-states support. When enabled in BIOS, it makes CPU to enter some deeper C-state when C1 state is requested by the OS.
It is typical for AMD CPUs BIOSes to not expose real C-states to OS, but instead use only C1E mechanism. For example, it may work this way: when OS requests C1 for some CPU core, it enters C2, but when all cores of some CPU package are in C2, the whole package goes into C3. Unluckily that functionality is completely hidden from the OS.
Originally, before SMP era, FreeBSD used i8254 (for HZ) and RTC (for stats) chipset timers. FreeBSD 8.x resurrect them for SMP systems. To use them, you can disable local APIC timers by adding to /boot/loader.conf:
Also, to drop/rise voltage on C3, CPU needs time (57us for my system). It means that C3 state can't be effectively used when system is waking up often. To increase inactivity periods we should reduce interrupt rate as much as possible by adding to loader.conf:
It may increase system response time a bit, but it is not significant for laptop. Also we may avoid additional 128 interrupts per second per core, by the cost of scheduling precision, with using i8254 timer also for statistic collection purposes instead of RTC clock, by using another newly added option:
As result, system has only 100 interrupts per core and CPUs are using C3 with high efficiency:
%sysctl dev.cpu |grep cx dev.cpu.0.cx_supported: C1/1 C2/1 C3/57 dev.cpu.0.cx_lowest: C3 dev.cpu.0.cx_usage: 0.00% 0.00% 100.00% last 7150us dev.cpu.1.cx_supported: C1/1 C2/1 C3/57 dev.cpu.1.cx_lowest: C3 dev.cpu.1.cx_usage: 0.00% 0.00% 100.00% last 2235us
Result of effective C3 state usage, comparing to C2+powerd, is about 2W.
As soon as entering C1E on AMD CPUs may result in unexpected and uncontrolled entering C3 and resulting local APIC timer stop, FreeBSD 8.x blocks C1E functionality completely.
FreeBSD 9.x included new event timers subsystem -- eventtimers(4), that allows to support more types of the timer hardware, including HPET, present in most of modern chipsets and invariant to CPU power management. System automatically chooses timer it consider best, but you may check and dynamically change timer to use via sysctl.
Also eventtimers(4) adds support for one-shot timer operation mode, when interrupts generated only when there is some work to do. That allows to not reduce kern.hz variable -- even multicore system should have only about 50-100 interrupts per second total when idle. But you still may want to do it, to reduce effect of some power-ineffectively written applications.
FreeBSD 9.x adds check whether it safe to use specific C-state with present event timer and may automatically block C2/C3 states, making it mostly safe. Nevertheless, due to possible performance degradation on some workloads, C-states use is not enabled by default now, you should enable it manually. Same time, on newer CPUs, enabling deeper C-states allows to use TurboBoost technology, that may increase performance of single-threaded applications.
On AMD CPUs FreeBSD 9.x blocks C1E only when local APIC timer is used. If the local APIC timer was ever used since boot, C1E will be blocked till the next reboot. You may want to force some other timer to be used in order to allow C1E to work.
2. Screen / Video
Screen back light can consume much power. From 1.5W with minimal, up to 4W with maximal brightness on my laptop. So you should find the way (hardware or software) to control it and tune for level minimally required in specific conditions. In my case it is controlled via hardware buttons. Some other laptops allow to control brightness via hw.acpi.video.lcd0.brightness sysctl, supported by acpi_video(4).
Graphics chip may consume significant amount of power, that may depend on used driver and its settings. On laptops with SandyBridge/IvyBridge CPUs graphics using new KMS-based "intel" driver may increase power consumption by 3W, comparing to "vesa" driver. Addition to the /boot/loader.conf line:
enables using power-saving idle states of the GPU and reduces power consumption.
This laptop has two 1GB DDR2-667 SODIMM memory modules installed. Removing one of them saves about 1W, Replacing two 1GB modules with single 2GB module also saves about 0.5W.
4. PCI devices
PCI bus provides method to control device power. For example, I have completely no use for my FireWire controller and most of time - EHCI USB controller. Disabling them allows me to save about 3W of power. To disable all unneeded PCI devices you should build kernel without their drivers and add to loader.conf:
To enable devices back all you need to do is just load their drivers as modules. New EHCI USB driver in 8.x consumes much less power then previous one.
6. HDA modem
I was surprised, but integrated HDA modem consumed about 1W of power even when not used. I have used the most radical solution - removed it mechanically from socket. Case surface in that area become much cooler.
7. HDA sound
To reduce number of sound generated interrupts I have added to the loader.conf:
On FreeBSD before 9-STABLE of 2012-03-10 also may be useful to increase maximal buffer sizes:
First common recommendation is use tmpfs for temporary files. RAM is cheap, fast and anyway with you. Also you may try to setup automatic idle drive spin-down, but if it is the only system drive you should be careful, as every spin-up reduces drive's life time. For several months (until I have bought SATA SSD) I have successfully used SDHC card in built-in PCI sdhci card reader as main file system. On random read requests it is much faster then HDD, but it is very slow on random write. Same time it consumes almost nothing. USB drives could also be used, but effect is much less as EHCI USB controller consumes much power. Spinning-down my 2.5" Hitachi SATA HDD saves about 1W of power. Removing it completely saves 2W.
Comparing to PATA, SATA interface uses differential signaling for data transfer. To work properly it has to transmit pseudo-random scrambled sequence even when idle. As you understand, that requires power. But SATA implements two power saving modes: PARTIAL and SLUMBER. These modes could be activated by either host or device if both sides support them. PARTIAL mode just stops scrambling, but keeps neutral link state, resume time is 50-100us. SLUMBER mode powers down interface completely, but respective resume time is 3-10ms.
The ata(4) driver has support for the SATA power management. There are hint.ata.X.pm_level loader tunables can be used to control it. Setting it to 1 allows drive itself to initiate power saving, when it wish. Values 2 and 3 make AHCI-compatible controller to initiate PARTIAL and SLUMBER transitions after every command completion. New ahci(4) driver also has hint.ahcich.X.pm_level tunable. It also support modes 4 and 5 for minimal performance degradation. Note that SATA power saving complicates drive hot-swap, as controller may be unable to detect drive presence when link is powered-down.
In my case PARTIAL mode saves 0.5W and SLUMBER - 0.8W of power.
USB devices can individually be switched to and from power save mode by running the following commands:
# This command will enable automatic suspend of the USB device when no data traffic is pending. usbconfig -d X.Y power_save # This command will disable USB power save for the given device. usbconfig -d X.Y power_on
The default for all devices except USB HUBs is power on. You should check the configuration descriptor of your device, that the "bmAttributes" field indicates that the device supports remote wakeup before enabling power save on a random USB device. It is not recommended to set the system timer tick rate below 250 HZ and enable USB power save, due to some USB suspend and resume delays which must comply to the USB specification. The power save feature also applies in the same way to USB device/gadget mode.
So what have I got? To monitor real system power consumption I am using information provided by ACPI battery via acpiconf -i0 command:
Design capacity: 4800 mAh Last full capacity: 4190 mAh Technology: secondary (rechargeable) Design voltage: 11100 mV Capacity (warn): 300 mAh Capacity (low): 167 mAh Low/warn granularity: 32 mAh Warn/full granularity: 32 mAh Model number: Victoria Serial number: 292 Type: LION OEM info: SIMPLO State: discharging Remaining capacity: 93% Remaining time: 2:24 Present rate: 1621 mA Voltage: 12033 mV
%acpiconf -i0 Design capacity: 4800 mAh Last full capacity: 4190 mAh Technology: secondary (rechargeable) Design voltage: 11100 mV Capacity (warn): 300 mAh Capacity (low): 167 mAh Low/warn granularity: 32 mAh Warn/full granularity: 32 mAh Model number: Victoria Serial number: 292 Type: LION OEM info: SIMPLO State: discharging Remaining capacity: 94% Remaining time: 4:47 Present rate: 826 mA Voltage: 12231 mV
So I have really doubled my on-battery time by this tuning - 4:47 hours instead of 2:24 with default settings. Cooling fan, previously running all the time, now idle most of time, when system is idle. Preinstalled vendor-tuned Windows XP on the same system, provides maximum 3:20 hours.
-- Alexander Motin