Pmctools: Features and Enhancements
This wiki page lists features and enhancements requested for the PMC toolchain and related tools and libraries.
Please see PmcTools/PlanOfWork for the current item on my plate. No particular order is implied for this list below. Help is most welcome.
(base system gprof) Teach gprof(1) about the differences between ELF32 and ELF64 executables so that running gprof on an ELF32 executable doesn't result in an immediate coredump. Note: A `cc -pg' binary compiled on an i386 has a 'struct gmon' size different from that on the amd64. This means that the gprof on an amd64 will get quite confused on seeing a gmon.out file from an i386 binary. Clearly gprof(1)'s design needs to be rethought if we desire it to be able to do cross-platform profiling. See also: LibElf.
Generate profiles for 32 bit i386 executables on amd64 systems.
(update 28/Mar/2006) pmcstat generates logs but since the layout of struct gmon is dependent on the native word size of the machine we aren't able to use an i386 gprof to analyze the log file. An amd64 gprof segfaults trying to read an i386 ELF binary because it too does not take care of the different binary layout of ELF32 and ELF64 files.Should the TSC be made virtualizable? There are various pros and cons involved. (update 16/Mar/2007) PAPI supports a virtualized TSC, so this may be a useful feature.
- (generic issue) Not quite a hwpmc bug, but many single processor P6 class machines keep the local APIC disabled and hence are unable to use hwpmc for sampling. Figure out a way to enable the local APIC on such machines.
- (requested by andrew gallatin) profile KLD's but generate a single profile. We cannot do this using stock gprof(1), and will need a new tool (possibly using the python interface or an enhanced version of pmcstat(8)). An alternative is to enhance gprof(1) to deal with multi-object profiles; apparently ups@ had done something similar some time back.
- Implement profiling for Linux/ELF binaries. This implies that pmcstat(8) should track the ELF brand like the kernel does.
- Rethink locking inside of hwpmc(4).
Allow PMCs to drive the sampling clock for 'regular' (i.e., cc -pg) profile-enabled applications. See PmcTools/PmcResearch.
- Allow PMC based sampling to deliver information useable by PAPI (PAPI presents a full register context to a user handler).
(generic issue) In order to be able to do "normal" profiles of shared libraries, we need to enhance the kernel to move from 1 profile buffer per process (i.e., set by the monitor() API) to one such buffer per object that requires profiling. We also need changes to many other toolchain components. See: http://yogurt.org/FreeBSD/4x_so_prof.diff
- Investigate a cheaper 'timestamp' (use RDTSC?) for the timestamp of each time entry. The issues here are monotonicity of the count on SMP machines and the ability to correlate the TSC reading with absolute time.
- Implement profiling support for a.out executables. Just for completeness.
- (from an email discussion with "ganbold") hwpmc should log process maps of all running processes so that a subsequent pmcstat -g run can build full profiles. Currently we only track processes that are exec'ed or fork'ed after logging starts, i.e., long running processes (e.g., mysqld) that were invoked before logging started have their samples ignored.
- Finish up the Python API for pmc(3) and pmclog(3).
- (documentation) A short tutorial style introduction to hwpmc(4) and userland tools is needed in the base system. This should begin with a simple model of modern CPUs and show how to use PMCs to measure performance of code using both the sampling and counting facilities provided by hwpmc(4).
(documentation) Describe the design of hwpmc(4) in /usr/share/doc/papers/hwpmc/.
Completed Stuff
In CVS
(requested by alc@) pmcstat(8) should setup system PMCs on all CPUS by default. In turn this implies reworking pmcstat(8)'s syntax to support a cleaner way of tying PMCs to CPUs. Added to -current on 22 Apr 2007.
Allow multiple invocations of -t to pmcstat(8); this would allow tracing of multiple processes in one session. Added to current on 27 Apr 2007.
Allow pmcstat(8)'s -t option to take a process name (as killall(1) does). If this translates to multiple processes (e.g., if a -t sh is given) then attach the PMC to all the selected processes. Added to current on 27 Apr 2007.
Awaiting Merge from Perforce
- Callchain capture.