Network Stack KPI/KBI

This wiki page attempts to identify kernel data structures which, if changed after a .0 release on a branch, may prevent network device drivers, firewall modules, as well as userland monitoring tools, etc, from working properly.

Data structure status key

The following status options exist for each structure:

Immutable - This data structure must not be changed

The data structure is very sensitive to change, as it may be allocated statically in BSS or on the stack by modules, or directly accessed by userland programs that require it to remain the same length. Changing the size of the structure could lead to (possibly extensive) kernel memory corruption, broken userland monitoring tools, etc.

Padded - Spare fields may be changed as long as the size remains the same

The same as immutable except that spare fields have been added to the structure in order to allow new functionality to be added despite the structure layout being fixed. Care should be taken when using spare fields, especially relating to the lengths of types like long or void *, which may vary by architecture. Likewise, the initialized values of structures may vary based on how the are allocated, so old code may not be aware of new initialization requirements: if allocated in BSS, fields may simply be zeroed, but if on the stack they may contain arbitrary values.

Extendable - New fields may be added at the end of the data structure

These data structures are visible in APIs used by third-party code, but the data structures are always allocated and managed by the base kernel; existing fields can't be changed, but new fields may be added, with caution.

Private - May be changed with (relative) impunity

These data structures are used only inside of the kernel or specific modules, and the layout may be changed without (excessive) concern.

Key to usage

The following uses are of particular interest from a KPI/KBI perspective:

Abbreviation

Description

MD

Module dereference: kernel modules encode the offset and type of structure fields in their implementation.

MA

Module allocation: kernel modules encode size of structure in their implementation, possibly in BSS, on the stack, or as an a direct argument to malloc(9) or uma(9).

SE

Sysctl export: the data structure is exposed to userspace via the sysctl(2) system call, and therefore part of the ABI.

SC

System call argument: data structure is an argument to a system call.

KD

KVM dereference: userspace monitoring or post-mortem analysis tools encode offset and type of structure fields in their implementation.

KS

KVM size dependency: userspace monitoring or post-mortem analysis tools encode the length of the structure in their implementation (i.e., due to the struct being allocated in an array in the kernel).

sys

Type

Status

MD

MA

SE

SC

KD

KS

Comments

struct mbuf

Immutable

Yes

??

No

No

No

No

Extreme caution required.

struct sockopt

Immutable

Yes

Yes

No

No

No

No

Allocated on the stack in kernel modules.

struct sockaddr

Immutable

Yes

Yes

Yes

Yes

??

??

Allocated statically, embedded in kernel structures, and part of the system call ABI.

struct unpcb

Immutable

Yes

No

Yes

No

Yes

??

Allocated only by uipc_usrreq.c, but accessed by portalfs, and, mysteriously, embedded in xunpcb, or would be Extendable.

struct xsocket

Immutable

No

No

Yes

No

No

No

Exposed explicitly via sysctl to userspace.

struct xunpcb

Immutable

No

No

No

Yes

No

No

Allocated only by uipc_usrreq.c, but exposed explicitly via sysctl to userspace.

net

Type

Status

Comments

struct bpf_d

Extendable

Allocated only in bpf.c, but fields accessed by MAC policy modules.

struct ifaddr

Immutable

Embedded in other data structures in protocols, such as in_ifaddr, etc.

struct ifnet

Padded

Allocated only in if.c, but used in all device drivers, and monitored from userspace via KVM. May now be safe to change this to Extendable.

struct netisr_handler

Padded

Allocated statically by protocol modules in BSS, but padded for future growth.

struct route

Immutable

Allocated statically in BSS, inside of data structures, and on the stack by protocols, device drivers, etc.

struct rtentry

Immutable

Generally only allocated by route.c, but used throughout the kernel. Would be Extendable if it weren't also allocated statically in one place by pf.c.

netinet

Type

Status

Comments

struct in_addr

Immutable

Embedded in countless user and kernel data structures and ABIs.

struct in_ifaddr

Extendable

Allocated only by in.c, but used in many places in net, netinet, netipsec.

struct inpcb

Padded

Allocated from a number of zones, depending on protocol type; exposed explicitly to userspace via xinpcb, xtcpcb via sysctl. Without exposure via sysctl, might be Extendable but would need to review all protocols to make sure that is safe. Is accessed from modules including firewalls, IPSEC, and TOE.

struct inpcbinfo

Extendable

Accessed by user monitoring tools via KVM.

struct ipq

Extendable

Allocated only by ip_input.c, but some fields read by MAC policy modules.

struct sackhint

Padded

Used only inside of TCP, but embedded in struct tcpcb so is exposed to userspace and other kernel modules with tcpcb visibility.

struct sockaddr_in

Immutable

Allocated statically and part of the system call ABI; embedded in other data structures.

struct syncache

Extendable

Currently allocated/consumed only in TCP, but user monitoring tools will access it via KVM in the future.

struct syncache_head

Private

Currently allocated and accessed only by TCP, but may be accessed via KVM by user monitoring tools in the future.

struct tcp_hostcache

Extendable

Currently allocated/consumed only in TCP, but user monitoring tools will access it via KVM in the future.

struct tcpcb

Immutable

Currently allocated only by TCP, but accessed in a number of places including NFS, and exported to userspace directly via xtcpcb in sysctl. Would probably be Extendable if it weren't exported directly.

struct tcpopt

Padded

Allocated and used only within TCP -- except for TOE modules, also allocated it (on the stack) and pass it into syncache entry initialization.

struct tcptw

Private

Currently allocated and accessed only by TCP, but may be accessed via KVM by user monitoring tools in the future.

struct tcp_syncache

Extendable

User monitoring tools will access it via KVM in the future.

struct udpdb

Private

Currently allocated and accessed only by UDP/UDP6, but may be accessed via KVM by user monitoring tools in the future.

NetKPIKBI (last edited 2009-08-01T11:04:45+0000 by RobertWatson)