Multipage allocations

Benchmark used

iozone -r 32 -s 8192m -l 16 -u 16 -i 0

Informations about the benchmark

        Record Size 32 KB
        File size set to 33554432 KB
        Command line used: iozone -r 32 -s 32768m -l 16 -u 16 -i 0
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Min process = 16 
        Max process = 16 
        Throughput test with 16 processes
        Each process writes a 33554432 Kbyte file in 32 Kbyte records

Informations about the system:

CPU: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz (2933.40-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 12 CPUs
hw.physmem: 25739268096
vfs.zfs.arc_min: 2984637440
vfs.zfs.arc_max: 23877099520

Allocation distribution

The following script captures the 'size' argument passed to uma_large_malloc() and build a distribution.

Dtrace script

fbt:kernel:uma_large_malloc:entry
{

        @counts[arg0 / 4096] = count();
}

Results

               32          5115028
                4           155907
                2            10386
                3             4282
                5             1455
                8             1247
                6             1033
                7              721
               12              620
                9              481
               10              464
               11              394
               16              269
               13              263
               14              228
               15              189
               20              143
               17              143
               18              114
               26              104
               19               83
               21               81
               22               52
               24               48
               23               32
               27               25
               28               23
               31               22
               30               16
               29               15
               25               11

Profiling with hwpmc(4)

%SAMP IMAGE      FUNCTION             CALLERS
 57.6 kernel     __mtx_lock_sleep     _vm_map_lock
 14.4 kernel     pmap_enter           kmem_back
  2.3 kernel     cpu_search_highest   cpu_search_highest
  1.4 kernel     _sx_xlock
  1.2 kernel     _sx_xunlock
  0.8 libc.so.7  bsearch
  0.7 kernel     vm_page_splay
  0.6 kernel     _mtx_lock_spin_cooki pmclog_reserve
  0.6 zfs.ko     lzjb_compress        zio_compress_data

Cost of allocations

Average cost per allocation, measured wrapping the code for allocations between two calls to rdtsc():

debug.calls_free: 4394689
debug.cycles_free: 2397454367445
debug.calls_alloc: 4396755
debug.cycles_alloc: 2853281095838

Avg per alloc: 648951
Avg per free: 545534

UMA

Make allocations going through UMA rather than the VM layer creating zone for 2, 4, 8, 16, 32 pages (in sys/kern/kern_malloc.c)

debug.calls_free: 5010701
debug.cycles_free: 2976686362
debug.calls_alloc: 5012661
debug.cycles_alloc: 18496362281

Avg per alloc: 3689
Avg per free: 594

ZFSvm (last edited 2013-03-19 20:18:17 by DavideItaliano)