Profiling on Modern CPUs
Profiling isn't just "count instructions, look for hotspots" anymore. Between the superscalar-ness, the multi-core-ness, the shared-cache-ness, the shared-bus-ness, the hierarchical memory-ness and a lot of other -ness'es, things get a little.. special.
This is (for now) a reading list and set of notes of what I've come across whilst exploring what's going on with modern intel hardware.
There are a few things to try and make .. easier to do in PMC:
- A system overview (eg GNU/perf 'perf stat') that gives simple overviews for things like general instruction counts and efficiencies, cache thrashing, resource stalls, etc
- Make it much easier to establish where certain classes of bottlenecks are when profiling (again - bus / cache bottlenecks, cache thrashing, resource stalls)
- A much nicer way of summarising / analysing counters - right now it's per-CPU and on larger CPU machines this gets very unwieldy very quickly
- A machine-summarised version of the counter output (with timestamping) so it can be fed into external processing scripts