Tasks / Roadmap

Kernel

This is an initial batch of ideas and a work in progress. As we get further down the road we will probably add more items to this list (including more APIs which need NUMA awareness).

Description

Status

Owner

Commit / Branch / Patch

Parsing SRAT on x86 and adding domains to vm_phys

done

attilio / jeff / jhb

stable/10

bus_get_domain() and dev.foo.N.%domain

committed

adrian / jhb

r272799 / r274976

CPU_WHICH_DOMAIN and cpuset -gd

needs testing

jeff / jhb

r276829

Teach topology code about NUMA domains (for x86 hierarchy is package -> domain -> core -> thread)

not started

bus_get_cpus() to query arbitrary CPU sets (including "local" CPUs and "best intr" CPUs)

needs testing

jhb

https://github.com/bsdjhb/freebsd/compare/bsdjhb:master...numa_bus_get_cpus

Assign interrupts to a local CPU in intr_cpus by default on x86

not started

Design a NUMA allocation policy data type

in progress

jeff

projects/numa

Remove the "cache" page queue (makes subsequent vm_phys changes simpler)

done

alc / kib / markj

cache queue removal

Update the vm_phys layer to accept NUMA allocation policy

in progress

jeff

projects/numa

Update KVA allocation to be domain aware (since superpages get in the way of doing a straight plumb of domain from contigmalloc, kmem_*, etc through to vm_phys

not started

Update contigmalloc, kmem_* to accept NUMA allocation policy

in progress

jeff

projects/numa

Update busdma tags to have a domain identifier and optionally a policy, inheriting from the bus default (eg acpi-pci)

not started

Update static bus_dma allocations to allocate busdma memory local using the busdma tag domain identifier/policy

not started

Add NUMA awareness to UMA

in progress

jeff

projects/numa

Per-domain page daemon improvements

not started

Per-domain free list locking

not started

Migrate PCPU allocations to be domain-local

not started

Migrate vm_page_t, etc kernel structures to be domain local

not started

(optionally) migrate vm_page_t and other memory/VM management structures to be in a single 1G superpage if possible, rather than at the top of physmem which is typically not backed by a single 1G superpage

not started

KVA allocation (to enable malloc/contigmalloc)

One of the big steps required getting NUMA aware malloc/contigmalloc/uma is a domain aware KVA allocator. Unfortunately domain aware physical page allocation isn't enough - the superpage reservation framework impacts this, as the upper levels allocating KVA (and then backing them with physical pages) doesn't know that the underlying page allocation may be a 2MB superpage. Some experiments were done to plumb a domain id (or -1 for default) from contigmalloc/kmem_malloc through the vm_reserv layer to vm_phys page allocation - and it didn't quite work.

For example:

* allocate 4k page for domain 0 - allocates KVA block A, backs it with 4k page PG(A), physical superpage S(A), fills it in with physical page PHYSPG(A) * allocate 4k page for domain 1 - allocates KVA block A+1, backs it with PG(A)+1, which is in the same superpage S(A), so it goes into PHYSPG(A) and thus on domain 0.

So, there are some solutions:

The last one was the suggestion from a number of people on IRC.

The challenge here is how we defrag the domain KVA vmem allocations back to the main KVA pool. Anything not going through the domain specific KVA pool can get starved of allocations.

Userland

One open question here is what range of policies do we want to support? Linux supports a process-wide allocation policy that can be overridden for specific mappings. Do we want to support process-wide policies? Do we want per-thread policies as well? Per-object policies? Also, for the range of policies supported, what is the precedence ordering?

Description

Status

Owner

Commit / Branch / Patch

Prototype process-wide policies

in progress

jeff

projects/numa

Implement mapping policies (vm_map)

not started

libnuma-like API?

not started

adrian

numactl-like functionality to adjust policy for new and/or existing processes

completed

adrian

A monitoring tool akin to numa-top

not started

Reviews

Link

Description

https://reviews.freebsd.org/D1897

migrate taskqueue_start_threads_pinned() -> taskqueue_start_threads_cpuset()

https://reviews.freebsd.org/D1674

skip gratuitous inactive queueing

https://reviews.freebsd.org/D1672

per-cpu page cache

NUMA (last edited 2017-03-29T06:45:16+0000 by KubilayKocak)