Physical memory compaction
Introduction
Tracking active compactions
Since several potential subsystems could benefit from memory compaction at various granularities, the compaction subsystem was designed to track running memory compaction jobs to make sure that there are no overlaps between ongoing compactions. It exposes an interface that a consumer can use for registering a "compaction job". A consumer registers the physical memory range it wishes to compact, a compaction function, and an optional search function. The compaction function is responsible for relocating 0-order pages within the registered physical memory range. The search function can optionally be used to pass multiple segments within the registered physical memory range to the compaction algorithm.
The vm_compact subsystem does not provide implementations of compaction algorithms, search functions or data structures for compaction - all of this is left to the consumer.
System-wide memory compaction
The vm_phys subsystem was extended to support system-wide, per-domain physical memory compaction. It uses a compaction function based on the 'two-finger' mark-compact algorithm [1, p. 33.] to rearrange 0-order pages inside a given physical memory region. Since system-wide compaction is an expensive operation, the vm_phys subsystem uses a special data structure to quickly identify heavily fragmented regions of a memory domain, increasing the overall efficiency of the compaction.
Quantifying fragmentation
The vm_phys subsystem implements a well-known metric for tracking external fragmentation - the 'Free Memory Fragmentation Index (FMFI)' [2]. The 'FMFI' metric measures the degree of physical memory fragmentation for a given order by using metadata from the buddy allocator freelists. Its values range from arbitrary negative values up to 1000. A negative value implies that there is ample memory to serve an allocation request of the given order. A value between 0 (no fragmentation) and 1000 (highly fragmented) indicates the degree of physical memory fragmentation.
The value of the FMFI metric for each memory domain can be retrieved using the vm.phys_frag_idx sysctl.
The 'compaction search index'
As previously mentioned, system-wide compaction relies on a special data structure, known as the compaction search index, The core idea behind this structure is to "divide" the physical memory space in power-of-two-sized chunks and track various metrics for each of these chunks. The search index currently tracks the amount of free memory and the number of free 0-order pages for each chunk. This information is updated each time a page gets added or removed from the buddy allocator freelists. Since the physical memory space may contain holes, the search index also tracks a list of valid memory regions for each chunk to prevent the compaction algorithm from operating on invalid physical memory regions.
The search index is used by the vm_phys_compact_search function to identify memory regions suitable for compaction.
Proactive background compaction
All of the previously described components come together in the form of the 'compaction daemon'. This daemon is started during boot and spawns a kthread for each memory domain present in the system.
The vm_phys_compact_thread function registers the compaction job with the vm_compact subsystem and periodically performs compaction on its given domain. The compaction daemon also relies on the FMFI metric to reduce its CPU time and to track the impact of each compaction run.
The main goal of the compaction daemon is to reduce the value of the FMFI metric for a given order, in this case, the VM_LEVEL_0_ORDER. Compaction will not be started if the value of the FMFI metric falls below a certain threshold. This threshold is exposed as a sysctl (vm.phys_compact_thresh). Furthermore, if the compaction daemon was unable to relocate any pages or reduce the fragmentation after several runs, it sleeps for a longer period before trying again.
References
[1] Jones, R., Hosking, A., & Moss, E. (2016). The garbage collection handbook: the art of automatic memory management.
[2] Gorman, M., & Whitcroft, A. (2006, July). The what, the why and the where to of anti-fragmentation. In Ottawa Linux Symposium (Vol. 1, pp. 369-384).