See also: Solaris: ZFS Evil Tuning Guide

ZFS Tuning Guide

(Work in Progress)

To use ZFS, at least 1GB of memory is recommended (for all architectures) but more is helpful as ZFS needs *lots* of memory. Depending on your workload, it may be possible to use ZFS on systems with less memory, but it requires careful tuning to avoid panics from memory exhaustion in the kernel.

A 64-bit system is preferred due to its larger address space and better performance on 64bit variables, which are used extensively by ZFS. 32-bit systems are supported though, with sufficient tuning.

History of FreeBSD releases with ZFS is as follows:

i386

On i386 systems you will need to recompile your kernel with increased KVA_PAGES option to increase the size of the kernel address space before vm.kmem_size can be increased beyond 512M. Add the following line to your kernel configuration file to increase available space for vm.kmem_size to at least 1 GB:

options KVA_PAGES=512

By default the kernel receives 1GB of the 4GB of address space available on the i386 architecture, and this is used for all of the kernel address space needs, not just the kmem map. By increasing KVA_PAGES you can allocate a larger proportion of the 4GB address space to the kernel (2 GB in the above example), allowing more room to increase vm.kmem_size. The trade-off is that user applications have less address space available, and some programs (e.g. those that rely on mapping data at a fixed address that is now in the kernel address space, or which require close to the full 3GB of address space themselves) may no longer run.

For *really* memory constrained systems it is also recommended to strip out as many unused drivers and options from the kernel (which will free a couple of MB of memory). A stable configuration with vm.kmem_size="1536M" has been reported using an unmodified 7.0-RELEASE kernel, relatively sparse drivers as required for the hardware and options KVA_PAGES=512.

Some workloads need greatly reduced ARC size and the size of VDEV cache. ZFS manages the ARC through a multi-threaded process. If it requires more memory for ARC ZFS will allocate it. It can and usually does exceed arc_max (vfs.zfs.arc_max) while another thread within ZFS periodically frees memory allocated to ARC when arc_max has been exceeded. Therefore even with a small arc_max it is possible for ARC to exceed kmem_size_max and panic the system. On memory constrained systems it is safer to use an arbitrarily low arc_max. For example it is possible to set vm.kmem_size and vm.kmem_size_max to 512M, vfs.zfs.arc_max to 160M, keeping vfs.zfs.vdev.cache.size to half its default size of 10 Megs (setting it to 5 Megs anecdotally achieves even better stability).

There is one example (CySchubert) of ZFS running nicely on a laptop with 768 Megs of physical RAM with the following settings:

Kernel memory should be monitored while tuning to ensure a comfortable amount of free kernel address space. The following script will summarize kernel memory utilization and assist in tuning arc_max and VDEV cache size.

Note: Perhaps there is a more precise way to calculate / measure how large of a vm.kmem_size setting can be used with a particular kernel, but the authors of this wiki do not know it. Experimentation does work. :) However, if you set vm.kmem_size too high in loader.conf, the kernel will panic on boot. You can fix this by dropping to the boot loader prompt and typing set vm.kmem_size="512M" (or a similar smaller number known to work.)

The vm.kmem_size_max setting is not used directly during the system operation (i.e. it is not a limit which kmem can "grow" into) but for initial autoconfiguration of various system settings, the most important of which for this discussion is the ARC size. If kmem_size and arc_max are tuned manually, kmem_size_max will be ignored.

The issue of kernel memory exhaustion is a complex one, involving the interaction between disk speeds, application loads and the special caching ZFS does. Faster drives will write the cached data faster but will also fill the caches up faster. Generally, larger and faster drives will need more memory for ZFS.

amd64

FreeBSD 7.2+ has improved kernel memory allocation strategy and no tuning may be necessary on systems with more than 2 GB of RAM.

On systems using FreeBSD 7.0 and 7.1, kernel memory usage (vm.kmem_size) should be increased to around 1 GB and ARC size reduced:

This might help if the machine is also loaded with other tasks, such as network activity (a file server), etc. Tuning KVA_PAGES is not required on amd64.

To increase performance, you may increase kern.maxvnodes (/etc/sysctl.conf) way up if you have the RAM for it (e.g. 400000 for a 2GB system). Keep an eye on vfs.numvnodes during production to see where it stabilizes. AMD64 uses direct mapping for vnodes, so you don't have to worry about address space for vnodes on this architecture (as opposed to i386).

Application Issues

ZFS is a copy-on-write filesystem. As such metadata from the top of the hierarchy is copied in order to maintain consistency in case of sudden failure, i.e. loss of power during a write operation. This obviates the need for an fsck-like requirement of ZFS filesystems at boot. However the downside to this is that applications which perform updates in place to large files, e.g. databases, will likely perform poorly in this application of the filesystem due to excessive I/O from copy-on-write. Additionally, database applications, such as Oracle, maintain a large cache (called the SGA in Oracle) in memory will perform poorly due to double caching of data in the ARC and in the application's own cache. Reducing the ARC to a minimum can improve performance of applications which maintain their own cache.

ZFSTuningGuide (last edited 2009-11-09 23:03:28 by CySchubert)