Benchmark Advice

Introduction

A FreeBSD release is shipped by default configured to work everywhere, from the little single-CPU 486/arm/mips/... compatible systems to the powerful multi-core 64-bit servers. As such some defaults may not be optimal for a specific benchmark/workload. See this tuning guide for more information on optimizations. Don't forget to benchmark each stage of the tuning process.

Highly recommended points in this text are marked with the {*} icon. Consider not publishing results if those points haven't been followed. In any case, following all of the advice given below to the letter will not harm benchmark results.

Basics

Remember: when benchmarking two things, make sure that everything possible is the same (constants) and that the only difference between the two things to compare is what is benchmarked (variables). For example, when comparing the performance of GCC-compiled binaries, then use the same hardware host, the same OS install, the same source code, and only change the compiler versions used to compile the "benchmark" binaries. That way, the only variable is the "version of GCC", everything else is constant, and the benchmark is actually testing the performance of GCC.

Likewise, to benchmark the performance of two OSes, eliminate as many variables as possible:

That provides a starting point.

Then, modify one of the constants above, and re-run the benchmarks.

Then, modify one more of the constants above, and re-run the benchmarks.

And so forth. Each time, vary only 1 thing, so that only the impact of that *ONE* thing is measured.

Comparing "random binary compile with GCC X on FreeBSD Y on filesystem Z on hardware config A" against "random binary built with GCC Q on Linux R on fileystem S on hardware config B" doesn't show anything. Was the performance difference due to hardware? Filesystem? OS? GCC version? Something else?

DONT's

Be aware of exactly what is being benchmarked. For example, the base FreeBSD system includes two compilers at present: GCC 4.2.1 (FreeBSD 8+) and Clang/LLVM 3.0 (FreeBSD 9+). If comparing FreeBSD / GCC 4.2.1 against, for example, Ubuntu / GCC 4.7 then the results are unlikely to say anything meaningful about FreeBSD vs Ubuntu. It will just say something about high-performance computing with the default compiler, but with high-performance computing, one normally chooses the best suitable compiler for the task anyway. Newer versions of GCC are available in ports and LLVM/Clang is available for most other systems - use the same compiler on both systems for compute-bound benchmarks when comparing the influence of the system/kernel.

DO's

Include detailed information

When publishing benchmarks, always include at least the following information so that other people are able to fully understand the environment the benchmark was run in, and ideally replicate results

Obtain/produce reliable results

Benchmark explanations and pitfalls

Generic information

Choosing the right scheduler

As of this writing (9.0 RC3) the ULE scheduler has some issues when more compute-bound threads are competing for CPU usage than there are available CPUs. This is under investigation. If the benchmark is doing something like this, investigate to see if the BSD scheduler may be better suited.

XXX to be confirmed: Single CPU systems may benefit from the BSD scheduler too.

Parallel read/write tests

In filesystem / disk IO tests where writes and reads are interleaved / in parallel, be aware that FreeBSD prioritizes writes over reads. (XXX: explain why?)

Huge write throughput difference when comparing to another OS

FreeBSD has a low limit on dirty buffers (= the amount of write kept in RAM instead of flushing to disk) since under realistic load the already cached data is much more likely to be reused and thus more valuable than freshly written data; aggressively caching dirty data would significantly reduce throughput and responsiveness under high load (= the huge difference in throughput only means the system is mostly idle and the interesting use-case is not being benchmarked). It can be that FreeBSD accepts somewhere dirty buffers in the tens of megabytes, wheres another OS accepts 100 times more. This could lead to the impression that another OS has a better write throughput, whereas in reality FreeBSD has better real-world behavior. While there are surely cases where 100 times more dirty buffers don't hurt or are even desirable, FreeBSD prefers to optimize for the mixed use-case instead of the write-only use-case.

An interesting benchmark in this case is to generate a load which causes the other system to exceed the amount of allowed dirty buffers so that the system starts to flush the data to disk.

XXX: how to tune the amount of dirty buffers, vfs.hidirtybuffers?

Tests which involve a lot of calls to get the current time

FreeBSD supports high precision timecounters. This does not really matter if one just wants to know the current time to the minute, but as FreeBSD is supposed to be suitable for a lot of tasks by default, high precision timekeeping is the default. Some applications which make use of a lot of calls to get the current time (because e.g. in Linux a faster but less precise way of obtaining the time is used and the application in question was developed mainly on Linux) are impacted by this.

Applications which are known to be impacted:

XXX: (Link to) explanation how to fix the applications would be good

HTTP benchmarks

FreeBSD does not enable full HTTP-optimizations by default. make sure the accf_http kernel module is loaded. The module only helps for HTTP serving, HTTPS does not benefit.

This can be either done at the command line via kldload accf_http, or by adding accf_http_load="YES" to /boot/loader.conf and rebooting the system.

The HTTP server also needs to support and enable the HTTP accept filter. For e.g. apache 2.2 this is the case (and the start script of apache can auto-load the accf_http module if it is not run in a jail. Add apache22_http_accept_enable="YES" to /etc/rc.conf). XXX: adding a list of other HTTP servers which support this?

Benchmarking ZFS

ZFS performance relies heavily on ample resourcing (disk performance, memory). Using ZFS on a one or two disks will not give improved performance (compared to e.g. UFS), but does provide improved data safety, such as detecting when data is damaged by radiation or data-manipulating disk errors.

Give the system sufficient memory, and/or one read-optimized SSD for L2ARC cache for read performance (the number of SSD's depends upon the size of the workingset) or two mirrored (for data safety in case one SSD gets damaged) write-optimized SSDs for the ZIL for synchronous (DBs/NFS/...) write performance.

Benchmark specific information

Blogbench

From the blogbench website: "Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server."

So blogbench is a test which excercises the filesystem.

Take both read and the write performance into account. Reads and writes are done in parallel, so publishing only one of these numbers does not make sense (malicious people may think otherwise).

LAME

When using LAME to benchmark FreeBSD against another OS: its most probably not comparing the systems, but instead comparing the compilers (as with every userland-compute-bound application).

High precision benchmarking

Most of the text on which this page is based was originally posted here:

http://lists.freebsd.org/pipermail/freebsd-current/2004-January/019600.html

Note that this advice is mostly concerned with high-precision measurement of CPU-intensive tasks and may introduce needless complications for simpler benchmarks. For reference, PHK has used the procedures outlined below for fine-tuning his work on code that keeps track of machine time, thus the references to quartz crystals and temperature drift.

Also note that an extended version of these hints can be found in the FreeBSD Developers Handbook.

Additional tips

PHK> A number of people have started to benchmark things seriously now, and run into the usual problem of noisy data preventing any conclusions. Rather than repeat myself many times, I decided to send this email.

I experimented with micro-benchmarking some years back, here are some bullet points with a lot of the stuff I found out. You will not be able to use them all every single time, but the more you use, the better your ability to test small differences will be.

Enjoy, and please share any other tricks you might develop!


CategoryHowTo

BenchmarkAdvice (last edited 2022-09-19T02:08:56+0000 by KubilayKocak)