BSDCam2016 - energy awareness (Monday 15th August, 13:00)
- power mgmt in Linux more mature, work could be transferred to FreeBSD but some is mobile oriented so less relevant to FreeBSD
- optimal energy-aware scheduling needs application metadata about needs of individual applications
- powerd doesn't handle imbalanced loads very well. Can be more optimal to migrate a compute-intensive single-thread task around CPUs than pin it to a single CPU.
- need better ways to test and benchmark power management, e.g. realistic workloads, Xen to simulate hotplug etc.
- topology detection might not be working correctly on ARM. Look at hwloc tools.
- ARM is working on a new standard interface between OS and firmware that will include power management.
- Linux applications needing in-band energy measurement tend to use RAPL counters directly (Intel specific). Would prefer to use architecture-neutral energy measurement API instead.
** Rough notes **
Aggregate statistics about CPU utilization are not enough for energy-aware scheduling - it doesn't prevent processes bouncing between CPUs. Need more fine-grained statistics. This was done in Linux 4/5 years ago.
FreeBSD's powerd was written about 10 years ago and not much updated since. Powerd evidently doesn't handle imbalanced load scenarios very well and could do with more fine grained per-task load awareness.
David C has done some experiments pinning workloads to cores and found this is performance negative on Intel, because the CPU gets too hot. It's better to let the process migrate and let it snoop its working set out of the previous L1.
Robin R: there's losts of power management work in Linux/Android that is mobile oriented, and could be transferred to FreeBSD. Not necessarily relevant to FreeBSD because phones aren't running FreeBSD.
Difficult to do power-aware scheduling in Linux because the code is spread around many places. Easier to do in FreeBSD because it's starting from a clean slate.
What APIs are there for power management in FreeBSD? powerd uses sysctl to ask the kernel to make ACPI requests.
Linux looks at events which are due to happen in future and then works out what C-state the system needs to be in to meet these requirements. FreeBSD does nothing like that.
Need to understand how the applications are behaving (e.g. are the processes independent), to understand whether it's best to schedule them simultaneously on multiple cores. No easy way to understand a task's latency requirements - this would need metadata for the application. In Android, the middleware knows what tasks need special treatment, and then it handles them specially.
FreeBSD has lots of problems scaling beyond 16 cores - problems with multiple cores or with on-package NUMA. Problems with lock scalability. Intel does the scalability work on Linux. There is no unified effort on FreeBSD scalability - Netflix is doing some work. RCU patent encumbrances, the patent grant only covers LGPL code. (Most patents expired now)
No use of FreeBSD in HPC (HPC community does lots of work around topology awareness and energy awareness) - it would be throwing away 10-15% of performance.
How to test hotplug? Run it under Xen, add/remove CPUs.
In Linux there are multiple governors e.g. interactive, performance.
OpenBSD people were criticising userspace power management mechanisms. Reason powerd lives in userspace is because developer doesn't want to work in the kernel.
Robin: For energy-aware scheduling you need a scoring system for applications and machinery to request that each application gets the schedule it needs. To develop energy aware schedling you need a useful development platform, e.g. 64-bit, big.LITTLE, and simple (hardware) power control story. Example would be ARM's Juno development platform (Cortex-A57+Cortex-A53). It has separate on-chip energy meters for the big and little clusters.
Does powertop work on FreeBSD? Various tools using RAPL on Intel. RAPL exposed in Linux kernel via powercap sysfs and perf (recent kernels) as well as low-level register access via msr.ko kernel driver. Used by HPC job managers (SLURM, GEOPM), monitoring tools (PAPI, likwid) etc. Anyone doing similar on FreeBSD? Can't see any evidence of this.
Lack of standard workloads to evaluate quality of energy management, especially outside of mobile.
Topology detection on ARM: use of MPIDR is not well enough standardized (e.g. meaning of affinity levels is not standardized). More information is being provided in the CPU nodes in the DT to describe the topology.
Unclear what's the FreeBSD equivalent of Linux's /sys/devices topology description for userspace. Look at how topology is being exposed to userspace, and see if it can support hwloc (https://www.open-mpi.org/projects/hwloc/).
Ruslan: power-down on ARM, beyond WFI - i.e. turn off reception of interrupts. ARM is working on a standard interface between kernel and firmware (follow on to PSCI) - ARM to provide any information that's publicly available (Robin).