Preface

A FreeBSD release is shipped by default configured to work everywhere, from the little single-CPU 486/arm/mips/... compatible systems to the powerful multi-core 64-bit servers. As such some defaults may not be optimal for a specific benchmark/workload. A tuning guide is available in case you want to get the most out of it. Do not forget to benchmark if the tuning makes a difference for your workload.

Highly recommended points in this text are marked with the {*} icon. Please consider not publishing your results if you haven't followed at least those points. In any case, following all of the advice given below to the letter will not harm benchmark results.

Remember: when benchmarking two things, you need to make sure that everything possible is the same (constants) and that the only difference between the two things to compare is what you want to benchmark (variables). For example, if you want to compare the performance of GCC-compiled binaries, then you would use the same hardware host, the same OS install, the same source code, and only change the compiler versions used to compile the "benchmark" binaries. That way, the only variable is "version of GCC", everything else is constant, and thus the benchmark is actually testing the performance of GCC.

Likewise, if you want to benchmark the performance of two OSes, you need to eliminate as many variables as possible:

That gives you the starting point.

Then, you modify one of the constants above, and re-run the benchmarks.

Then you modify one more of the constants above, and re-run the benchmarks.

And so forth. Each time, you vary only 1 thing, so that you can measure the impact of that *ONE* thing.

Comparing "random binary compile with GCC X on FreeBSD Y on filesystem Z on hardware config A" against "random binary built with GCC Q on Linux R on fileystem S on hardware config B" doesn't show anything. Was the performance difference due to hardware? Filesystem? OS? GCC version? Something else?

Do not do this!

Please also be aware of what you are benchmarking. For example, the base FreeBSD system includes two compilers at present: GCC 4.2.1 (FreeBSD 8+) and Clang/LLVM 3.0 (FreeBSD 9+). If you compare FreeBSD / GCC 4.2.1 against, for example, Ubuntu / GCC 4.7 then the results are unlikely to tell you anything meaningful about FreeBSD vs Ubuntu. It will just tell something about high-performance computing with the default compiler, but if you are into high-performance computing, you normally chose the best suitable compiler for your task anyway. Newer versions of GCC are available in ports and LLVM/Clang is available for most other systems - make sure that you are using the same compiler on both systems for compute-bound benchmarks if you want to compare the influence of the system/kernel.

Information to include

If you publish benchmark results, you should always include the following information so that other people are able to fully understand the environment the benchmark was run in:

Make sure you get good numbers

Benchmark explanations and pitfalls

Generic information

Choosing the right scheduler

As of this writing (9.0 RC3) the ULE scheduler has some issues when more compute-bound threads are competing for CPU usage than there are available CPUs. This is under investigation. If your benchmark is doing something like this, you should investigate to see if the BSD scheduler may be better suited for this benchmark.

XXX to be confirmed: Single CPU systems may benefit from the BSD scheduler too.

Parallel read/write tests

If you do a FS/disk I/O test where writes and reads are interleaved / in parallel, you need to be aware that FreeBSD prioritizes writes over reads. (XXX: explain why?)

Huge write throughput difference when comparing to another OS

FreeBSD has a low limit on dirty buffers (= the amount of write kept in RAM instead of flushing to disk) since under realistic load the already cached data is much more likely to be reused and thus more valuable than freshly written data; aggressively caching dirty data would significantly reduce throughput and responsiveness under high load (= the huge difference in throughput only means your system is mostly idle and you are not benchmarking the interesting use-case). It can be that FreeBSD accepts somewhere dirty buffers in the tens of megabytes, wheres another OS accepts 100 times more. This could lead to the impression that another OS has a better write throughput, whereas in reality FreeBSD has better real-world behavior. While there are surely cases where 100 times more dirty buffers don't hurt or are even something you want to have, FreeBSD prefers to optimize for the mixed use-case instead of the write-only use-case.

An interesting benchmark in this case is to generate a load which causes the other system to exceed the amount of allowed dirty buffers so that the system starts to flush the data to disk.

XXX: how to tune the amount of dirty buffers, vfs.hidirtybuffers?

Tests which involve a lot of calls to get the current time

FreeBSD has high precision timecounters. This does not really matter if you just want to know the current time to the minute, but as FreeBSD is supposed to be suitable for a lot of tasks by default, you get high precision timekeeping by default. Some applications which make use of a lot of calls to get the current time (because e.g. in Linux a faster but less precise way of obtaining the time is used and the application in question was developed mainly on Linux) are impacted by this.

Applications which are known to be impacted:

XXX: (Link to) explanation how to fix the applications would be good

HTTP benchmarks

FreeBSD does not enable full HTTP-optimizations by default. If you want to get the most performance out of FreeBSD, make sure the accf_http kernel module is loaded. The module only helps for HTTP serving, HTTPS does not benefit from this.

You can either do this on the command line via kldload accf_http, or by adding accf_http_load="YES" to /boot/loader.conf and reboot the system. The HTTP server also needs to support the HTTP accept filter. For e.g. apache 2.2 this is the case (and the start script of apache can auto-load the accf_http module if it is not run in a jail if you add apache22_http_accept_enable="YES" to /etc/rc.conf). XXX: adding a list of other HTTP servers which support this?

Benchmarking ZFS

If you want to benchmark ZFS, be aware that it will only shine if you are willing to spend money. Using ZFS on a one or two disks will not give improved performance (compared to e.g. UFS), but it will give improved safety for your data (you know when your data is damaged by e.g. radiation or data-manipulating harddisk-errors). To make it shine you need to add at least a lot fo RAM, or one read-optimized SSD for L2ARC cache for read performance (the number of SSD's depends upon the size of the workingset) or two mirrored (for data safety in case one SSD gets damaged) write-optimized SSDs for the ZIL for synchronous (DBs/NFS/...) write performance.

Benchmark specific information

Blogbench

From their website: "Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server."

So blogbench is a test which excercises the FS (if you are not Wordpress.com (or similar), it's unlikely that you should think about it as a benchmarks for blogs).

You have to take the read and the write performance into account. Reads and writes are done in parallel, as such only presenting one of the numbers (in a publication or to your boss or whoever), does not make sense (malicious people may think otherwise).

LAME

If you compare FreeBSD against another OS using LAME as one of the benchmarks: most probably you are not comparing the systems, you are comparing the compilers (as with every userland-compute-bound application).

High precision benchmarking

Most of the text on which this page is based was originally posted here:

http://lists.freebsd.org/pipermail/freebsd-current/2004-January/019600.html

Note that this advice is mostly concerned with high-precision measurement of CPU-intensive tasks and may introduce needless complications for simpler benchmarks. For reference, PHK has used the procedures outlined below for fine-tuning his work on code that keeps track of machine time, thus the references to quartz crystals and temperature drift.

Also note that an extended version of these hints can be found in the FreeBSD Developers Handbook.


PHK> A number of people have started to benchmark things seriously now, and run into the usual problem of noisy data preventing any conclusions. Rather than repeat myself many times, I decided to send this email.

I experimented with micro-benchmarking some years back, here are some bullet points with a lot of the stuff I found out. You will not be able to use them all every single time, but the more you use, the better your ability to test small differences will be.

Enjoy, and please share any other tricks you might develop!

BenchmarkAdvice (last edited 2011-12-22 11:34:27 by MarkLinimon)