Network Stack KPI/KBI
This wiki page attempts to identify kernel data structures which, if changed after a .0 release on a branch, may prevent network device drivers, firewall modules, as well as userland monitoring tools, etc, from working properly.
Data structure status key
The following status options exist for each structure:
Immutable - This data structure must not be changed
The data structure is very sensitive to change, as it may be allocated statically in BSS or on the stack by modules, or directly accessed by userland programs that require it to remain the same length. Changing the size of the structure could lead to (possibly extensive) kernel memory corruption, broken userland monitoring tools, etc.
Padded - Spare fields may be changed as long as the size remains the same
The same as immutable except that spare fields have been added to the structure in order to allow new functionality to be added despite the structure layout being fixed. Care should be taken when using spare fields, especially relating to the lengths of types like long or void *, which may vary by architecture. Likewise, the initialized values of structures may vary based on how the are allocated, so old code may not be aware of new initialization requirements: if allocated in BSS, fields may simply be zeroed, but if on the stack they may contain arbitrary values.
Extendable - New fields may be added at the end of the data structure
These data structures are visible in APIs used by third-party code, but the data structures are always allocated and managed by the base kernel; existing fields can't be changed, but new fields may be added, with caution.
Private - May be changed with (relative) impunity
These data structures are used only inside of the kernel or specific modules, and the layout may be changed without (excessive) concern.
Key to usage
The following uses are of particular interest from a KPI/KBI perspective:
Abbreviation |
Description |
MD |
Module dereference: kernel modules encode the offset and type of structure fields in their implementation. |
MA |
Module allocation: kernel modules encode size of structure in their implementation, possibly in BSS, on the stack, or as an a direct argument to malloc(9) or uma(9). |
SE |
Sysctl export: the data structure is exposed to userspace via the sysctl(2) system call, and therefore part of the ABI. |
SC |
System call argument: data structure is an argument to a system call. |
KD |
KVM dereference: userspace monitoring or post-mortem analysis tools encode offset and type of structure fields in their implementation. |
KS |
KVM size dependency: userspace monitoring or post-mortem analysis tools encode the length of the structure in their implementation (i.e., due to the struct being allocated in an array in the kernel). |
sys
Type |
Status |
MD |
MA |
SE |
SC |
KD |
KS |
Comments |
struct mbuf |
Immutable |
Yes |
?? |
No |
No |
No |
No |
Extreme caution required. |
struct sockopt |
Immutable |
Yes |
Yes |
No |
No |
No |
No |
Allocated on the stack in kernel modules. |
struct sockaddr |
Immutable |
Yes |
Yes |
Yes |
Yes |
?? |
?? |
Allocated statically, embedded in kernel structures, and part of the system call ABI. |
struct unpcb |
Immutable |
Yes |
No |
Yes |
No |
Yes |
?? |
Allocated only by uipc_usrreq.c, but accessed by portalfs, and, mysteriously, embedded in xunpcb, or would be Extendable. |
struct xsocket |
Immutable |
No |
No |
Yes |
No |
No |
No |
Exposed explicitly via sysctl to userspace. |
struct xunpcb |
Immutable |
No |
No |
No |
Yes |
No |
No |
Allocated only by uipc_usrreq.c, but exposed explicitly via sysctl to userspace. |
net
Type |
Status |
Comments |
struct bpf_d |
Extendable |
Allocated only in bpf.c, but fields accessed by MAC policy modules. |
struct ifaddr |
Immutable |
Embedded in other data structures in protocols, such as in_ifaddr, etc. |
struct ifnet |
Padded |
Allocated only in if.c, but used in all device drivers, and monitored from userspace via KVM. May now be safe to change this to Extendable. |
struct netisr_handler |
Padded |
Allocated statically by protocol modules in BSS, but padded for future growth. |
struct route |
Immutable |
Allocated statically in BSS, inside of data structures, and on the stack by protocols, device drivers, etc. |
struct rtentry |
Immutable |
Generally only allocated by route.c, but used throughout the kernel. Would be Extendable if it weren't also allocated statically in one place by pf.c. |
netinet
Type |
Status |
Comments |
struct in_addr |
Immutable |
Embedded in countless user and kernel data structures and ABIs. |
struct in_ifaddr |
Extendable |
Allocated only by in.c, but used in many places in net, netinet, netipsec. |
struct inpcb |
Padded |
Allocated from a number of zones, depending on protocol type; exposed explicitly to userspace via xinpcb, xtcpcb via sysctl. Without exposure via sysctl, might be Extendable but would need to review all protocols to make sure that is safe. Is accessed from modules including firewalls, IPSEC, and TOE. |
struct inpcbinfo |
Extendable |
Accessed by user monitoring tools via KVM. |
struct ipq |
Extendable |
Allocated only by ip_input.c, but some fields read by MAC policy modules. |
struct sackhint |
Padded |
Used only inside of TCP, but embedded in struct tcpcb so is exposed to userspace and other kernel modules with tcpcb visibility. |
struct sockaddr_in |
Immutable |
Allocated statically and part of the system call ABI; embedded in other data structures. |
struct syncache |
Extendable |
Currently allocated/consumed only in TCP, but user monitoring tools will access it via KVM in the future. |
struct syncache_head |
Private |
Currently allocated and accessed only by TCP, but may be accessed via KVM by user monitoring tools in the future. |
struct tcp_hostcache |
Extendable |
Currently allocated/consumed only in TCP, but user monitoring tools will access it via KVM in the future. |
struct tcpcb |
Immutable |
Currently allocated only by TCP, but accessed in a number of places including NFS, and exported to userspace directly via xtcpcb in sysctl. Would probably be Extendable if it weren't exported directly. |
struct tcpopt |
Padded |
Allocated and used only within TCP -- except for TOE modules, also allocated it (on the stack) and pass it into syncache entry initialization. |
struct tcptw |
Private |
Currently allocated and accessed only by TCP, but may be accessed via KVM by user monitoring tools in the future. |
struct tcp_syncache |
Extendable |
User monitoring tools will access it via KVM in the future. |
struct udpdb |
Private |
Currently allocated and accessed only by UDP/UDP6, but may be accessed via KVM by user monitoring tools in the future. |