Perfomance Visualization Tools

The features offered by the PmcTools toolset can be used in combination with FreeBSD's existing facilities to build a visual performance analysis tool.

This page looks at some of the issues involved in building such a tool.

Related Project(s):

Introduction

The questions that the tool would help answer are:

  1. What (hardware resource) is the current bottleneck?
    Performance could be being impacted on account of many factors:

    • Application or kernel code could be exhibiting poor cache behaviour.
    • Mispredicted branches may be interfering with the processor's pipelines.
    • One CPU in an SMP system could be getting overloaded.
    • A driver could be keeping interrupts turned off too long, affecting other parts of the system.
  2. Where in the code is the bottleneck?
    Once a bottleneck is identified, the next task (for a developer) would be to find the code causing the bottleneck. Since the bottleneck could be in userland or in the kernel the tool should offer "whole system" measurement capabilities.

A useful visualizer tool would combine existing performance measurement facilities in FreeBSD with the additional features offered by PmcTools.

Useful HWPMC Features

Next we look at some of the features of HWPMC that a performance visualization tool could use. (See also: the HWPMC(4) manual page).

PMC Modes

HWPMC allows PMCs to be allocated in system-wide mode, meaning hardware events seen by the system as a whole are measured. PMCs can also be allocated in process mode; such PMCs have to be attached to a set of processes and will count only hardware events that occur when threads belonging to these processes are scheduled on a CPU.

System-wide modes are useful to track the behaviour of the system as whole. Process-specific modes are useful when debugging the performance characteristics of specific processes.

Orthogonal to whether a PMC is a system-wide one or process specific, PMC can either count or sample.

Counting mode PMCs count hardware events; they have to be periodically read using the pmc_read() API.

Sampling mode PMCs are setup to interrupt the processor when a set number of hardware events have been seen. The interrupt handler captures the desired information (sampled PC value, or call stack).

Counting PMC have variants that offer different levels of counting granularity (and thus overheads).

Attaching to live processes

HWPMC(4) allows PMCs to be "attached" to running processes provided the user has sufficient privilege to do so. This operation is completely transparent to the target process and can be used to analyse the performance of long-running processes.

Separation of Data Collection and Analysis

HWPMC(4) can be configured to send captured PMC data over the network. The performance analyzer tool therefore need not be running on the machine that is undergoing performance measurements. This feature is useful in embedded contexts.

Separation of Data Collection and Analysis in Time

HWPMC(4)'s data log can be analysed post-facto. This feature can be used to compare performance runs or capture data in contexts where connecting a visualizer tool is not feasible (e.g., collecting performance data from within a customer deployment).

Tool Usage Scenarios

CPU behaviour over time

We would allocate system PMCs on each CPU of an SMP system and would periodically read and log their readings. The data could be presented in the form of a graph, or the tool could permit day-on-day kinds of comparision.

Finding code that is the bottleneck

Allocate a system-wide sampling PMC and collect the resulting samples. Map these samples to locations inside executables (programs, shared objects, kernel and kernel modules) in the system.

If the executables have line number information, allow drilling down to views of source code.

Profiling a particular process

Allocate a process-mode sampling PMC and attach it to the process in question. Analyse the resulting log to determine hotspots.

What's that process doing?

The visualizer tool could use KTRACE to show system call activity by a given process or set of processes. These could be correlated with PMC data being collected for the process.

UI Issues

Selecting From Large Numbers of Possible Events

Modern PMC counters can measure a very large number of possible hardware events. Most events also support additional modifiers and some events have constraints on their use.

The UI used for selecting a PMC event thus needs to assist the user in creating a PMC event specification:

PMC selection logic is very PMC dependent.

Efficient Keyboard Use

Attaching and detaching PMCs, zooming into process trees or areas of code need to be quick and efficient.

Filters

When viewing a log of samples (say collected from a system-wide sampling PMC) it would be useful to restrict analyses to:

Relative or Absolute PMC numbers

Conventionally we would graph the absolute values returned by pmc_read().

However when correlating numbers from multiple PMCs, it would be useful to compare deviations from the average for each PMC.

PmcTools/PerformanceVisualizer (last edited 2008-06-17T21:38:07+0000 by anonymous)