Perfomance Visualization Tools
The features offered by the PmcTools toolset can be used in combination with FreeBSD's existing facilities to build a visual performance analysis tool.
This page looks at some of the issues involved in building such a tool.
pmcgui: a Summer of Code project by Mathieu Prevot.
The questions that the tool would help answer are:
What (hardware resource) is the current bottleneck?
Performance could be being impacted on account of many factors:
- Application or kernel code could be exhibiting poor cache behaviour.
- Mispredicted branches may be interfering with the processor's pipelines.
- One CPU in an SMP system could be getting overloaded.
- A driver could be keeping interrupts turned off too long, affecting other parts of the system.
Where in the code is the bottleneck?
Once a bottleneck is identified, the next task (for a developer) would be to find the code causing the bottleneck. Since the bottleneck could be in userland or in the kernel the tool should offer "whole system" measurement capabilities.
A useful visualizer tool would combine existing performance measurement facilities in FreeBSD with the additional features offered by PmcTools.
Useful HWPMC Features
Next we look at some of the features of HWPMC that a performance visualization tool could use. (See also: the HWPMC(4) manual page).
HWPMC allows PMCs to be allocated in system-wide mode, meaning hardware events seen by the system as a whole are measured. PMCs can also be allocated in process mode; such PMCs have to be attached to a set of processes and will count only hardware events that occur when threads belonging to these processes are scheduled on a CPU.
System-wide modes are useful to track the behaviour of the system as whole. Process-specific modes are useful when debugging the performance characteristics of specific processes.
Orthogonal to whether a PMC is a system-wide one or process specific, PMC can either count or sample.
Counting mode PMCs count hardware events; they have to be periodically read using the pmc_read() API.
Sampling mode PMCs are setup to interrupt the processor when a set number of hardware events have been seen. The interrupt handler captures the desired information (sampled PC value, or call stack).
Counting PMC have variants that offer different levels of counting granularity (and thus overheads).
Attaching to live processes
HWPMC(4) allows PMCs to be "attached" to running processes provided the user has sufficient privilege to do so. This operation is completely transparent to the target process and can be used to analyse the performance of long-running processes.
Separation of Data Collection and Analysis
HWPMC(4) can be configured to send captured PMC data over the network. The performance analyzer tool therefore need not be running on the machine that is undergoing performance measurements. This feature is useful in embedded contexts.
Separation of Data Collection and Analysis in Time
HWPMC(4)'s data log can be analysed post-facto. This feature can be used to compare performance runs or capture data in contexts where connecting a visualizer tool is not feasible (e.g., collecting performance data from within a customer deployment).
Tool Usage Scenarios
CPU behaviour over time
We would allocate system PMCs on each CPU of an SMP system and would periodically read and log their readings. The data could be presented in the form of a graph, or the tool could permit day-on-day kinds of comparision.
Finding code that is the bottleneck
Allocate a system-wide sampling PMC and collect the resulting samples. Map these samples to locations inside executables (programs, shared objects, kernel and kernel modules) in the system.
If the executables have line number information, allow drilling down to views of source code.
Profiling a particular process
Allocate a process-mode sampling PMC and attach it to the process in question. Analyse the resulting log to determine hotspots.
What's that process doing?
The visualizer tool could use KTRACE to show system call activity by a given process or set of processes. These could be correlated with PMC data being collected for the process.
Selecting From Large Numbers of Possible Events
Modern PMC counters can measure a very large number of possible hardware events. Most events also support additional modifiers and some events have constraints on their use.
The UI used for selecting a PMC event thus needs to assist the user in creating a PMC event specification:
- Common aliases could be presented first.
- The UI should record previously used event+modifier combinations for easy reuse.
- PMC events could be grouped into logical sections. For example, all cache related events in one pane, bus related events in another, branch related measurements in the third, etc.
- The UI should disallow events that violate constraints.
- Sometimes the user may be unaware of the exact event supported by the hardware. The UI should allow for searching for hardware events (e.g., "show me all cache related measurements possible") and should suggest additional related PMC events that could be useful.
PMC selection logic is very PMC dependent.
Efficient Keyboard Use
Attaching and detaching PMCs, zooming into process trees or areas of code need to be quick and efficient.
When viewing a log of samples (say collected from a system-wide sampling PMC) it would be useful to restrict analyses to:
- Processes associated with a given executable (e.g., "make" or "mozilla") or a specific pid or family of processes.
- Samples that fell between a specified start and end time.
Relative or Absolute PMC numbers
Conventionally we would graph the absolute values returned by pmc_read().
However when correlating numbers from multiple PMCs, it would be useful to compare deviations from the average for each PMC.