Project Description

(from Google Summer of Code 2008 project listing)

Prior to UFS2, the performance of name lookups in directories with many entries on FreeBSD could be quite slow. FFS does not maintain an on-disk index of directory entries, so each name lookup required a linear search of the directory entries. With UFS2 a new technique called dirhash was developed to generate a hash map in memory of directory entries the first time a name lookup is run on a large directory, so that subsequent name lookups will be much faster. Dirhash has had a substantial positive impact on filesystem performance.

However, the amount of memory which can be used by dirhashes is fixed by the kernel (although it is tunable at runtime via a sysctl). I propose to implement dynamic memory allocation for dirhashes, allowing more memory to be allocated for dirhash when needed if it is available, and reclaimed if memory becomes scarce. The goal of this is to improve performance for filesystems with very large directories on systems with memory to spare, without the user having to manually tune the amount of memory available for dirhash.

Furthermore, if time permits I would like to investigate implementing on-disk indexing for UFS2. If directory indices are generated and stored on disk, there would no longer be a need to generate a completely new dirhash each time a directory is first accessed after system startup, or after a dirhash for a directory not recently accessed has been disposed of to conserve memory.

Plan

Status

I blog about this project here. The code I am working on can be found in the FreeBSD Perforce depot here.

(2008-6-19) I am working on implementing a function to be registered with EVENTHANDLER_REGISTER() for vm_lowmem events that will free dirhashes. I am trying to determine how to decide which hashes to free, and will need to find how many should be freed each time this event occurs. Determining the latter might be easiest through trial and error, watching how often vm_lowmem events pop up when the system is low on memory.

(2008-6-27) Here is a patch with a first attempt at freeing dirhashes when a low memory signal is sent: dirhash_lowmem_event_2008-6-28.patch. In this patch just the one oldest dirhash that can be freed will be.

(2008-7-12) I now have an updated patch: dirhash_lowmem_event_2008-7-12.patch. Now the dirhash data structure has a new field for the last used time stamp. When a vm_lowmem event occurs, every dirhash that has not been used for five seconds (this will probably need to be tuned) is discarded. If all dirhashes have been used more recently than this, then the first one on the dirhash list TAILQ will be deleted.

(2008-8-14) Updated patches again. Now there are two: one against HEAD dirhash_lowmem_head_2008-8-14.patch and one that should work both on 7.0 and 7-STABLE dirhash_lowmem_7-stable_2008-8-14.patch. These fix a bug in the previous code where hashes older than 5 seconds would never be deleted and instead only one hash at the head of the TAILQ would be removed with each vm_lowmem event. Also, now the deletion code is more aggressive: all hashes older than 5 seconds (tunable via a sysctl, vfs.ufs.dirhash_reclaimage) will always be deleted when a vm_lowmem event occurs, and if that doesn't free up at least 10% of the memory currently being used by dirhash, more hashes will be destroyed from the head of the TAILQ until we do free up at least 10% of the memory initially used. Also, below I have posted some results from some benchmarking.

Benchmarks

Here is a comparison of the speed running a number of different tasks that are fairly filesystem intensive with and without my dirhash low memory event handling code compiled into the kernel. The system used for testing is a 1.83GHz Core Duo Macbook with 2GB of memory and a 5400RPM 2.5" SATA hard drive. It's running i386 8-CURRENT, and WITNESS and INVARIANTS related options in the kernel are disabled. The system was running multiuser with only a few extra processes like wpa_supplicant, sshd, and powerd running. The vfs.ufs.dirhash_maxmem sysctl was set to 100MB at all times for both kernels, and vfs.ufs.dirhash_reclaimage was 5 seconds when using the vm_lowmem handling kernel. Each test was run with a random amount of wait time from 1 to 5 minutes between each test, and the whole set of tests was repeated for a total of four runs for the set.

mean run time - no vm_lowmem handler

std. deviation - no vm_lowmem handler

mean run time - with vm_lowmem handler

std. deviation - with vm_lowmem handler

postmark

2378s

16s

2327s

77s

portsnap extract

312s

55s

315s

57s

tar extract

505s

1,8s

498s

5,5s

svn checkout

1084s

131s

1023s

16,7s

rm -rf ports

76s

3,7s

78s

1,5s

make buildworld

4373s

2,3s

4382s

23s

rm -rf src

115s

6,5s

115s

9,1s

rm -rf svnroot

20s

5,5s

18,7s

4,3s

postmark was run with 1 000 000 files, 500 subdirectories, and 10 000 transactions. The portsnap extract test just involved extracting an already fetched ports tree into a clean directory with portsnap extract. tar extraction was of a bzipped tarball containing a copy of the FreeBSD svn repository, svnmirror-base-r179637.tbz2, fetched from ftp://ftp.freebd.org/pub/FreeBSD/development/subversion. The svn checkout test did a subversion checkout from this newly extracted local svnroot of revision 178000 from HEAD, and make buildworld was run in the checked out src directory. The different rm tests just timed doing rm -rf on the extracted ports and svnroot directories, and the checked out src directory.

The difference in run times was not significantly different with the two versions of code. During the postmark and portsnap tests no vm_lowmem events were seen, but a few of them did get fired and delete dirhashes in the later tests that used more memory. This did not seem to slow things down much though. Another test run was bonnie++ -r 2048 -n 1000 -s 0 -u root, and that also had little difference in the number of various operations per second for the different kernels.

Next I will be testing different values for vfs.ufs.dirhash_reclaimage to find a more optimal value there.

DirhashDynamicMemory (last edited 2008-08-15 10:13:21 by NickBarkas)