FreeBSD Developer Summit: Networking Working Group

We would like to cover the following topics. This is not an exhaustive list and if you feel there is something missing that you want to talk about, contact the session chair and your topic will be included here.

Please send talks, brainstorm topics, or other proposals to the session leaders, LawrenceStewart and GlebSmirnoff.

Agenda

Results

(Add a list or attach slides detailing the achieved results here.)

Stack/driver interface - Andre Oppermann

To come:

Version shims or will all vendors be expected to switch?

Will open projects branch for this.

Technical monitor is Ed Maaste from FF

Can change not be one large commit? #ifdef conditionalized, and convert most important drivers sequentially.

One aim is to separate L2 and L3 but that's outside scope here

Changing the way drivers announce capabilities to stack -- idea is to have an area of struct declared by driver like in setsockopt() -- list of items that the stack will walk.

It's like newbus. So why not newbus? "Newbus is not just buses, it's also not new (phk)" (Has been reinvented twice, not well since). To be investigated.

There can be additional failure possibilities if functions exported as pointers in list have to allocate memory otherwise will be the same as now.

Vendor capabilities, route to integrate advanced features of hardware.

TCP offload? Is ifnet the right level of abstraction for modern network hardware?

Just in time compilation to improve performance on cache-deep processes? Not a problem now, but it will be tomorrow. There will be cards that offload the entire network stack.

Automagic export of C code at compile time a la

IFQ interface: pain point

Get rid of IFQ and associated softqueue -- most HW has hardware support.

The stack will call down completely lockless, (no ifnet lock or driver lock). Drive can implement optimal approach

Pretty much everything uses iftransmit now, except altq. QoS? No explicit API for timing. Intermediary queue discipline? Stack iftransmit function pointer replaced by altq and will call down into driver iftransmit. Multiple layers of queue disipline? nterface only supports one layer.

Work is to prepare proposal, documentation and run performance tests so we can make informed decision

Can we pass in other structures than -- eg writing disk blocks directly into network packet? This is another construction site. Whois looking at busdma?

Once documentation etc. looked at, can update drivers.

sometime in the next 4 months.

Common API for common functions for drivers to use, but not stack. Driver can benefit automatically from inprovements

Vendor feedback: can we move bufferring up into the stack? Control queue lengths and multilayer queue disciplines. Adaptive behaviour depending on how many packets waiting

Copy-pasting in drivers is not a good thing. Copy code should be centralised into the stack available for driver to use. Infrastructure almost in place to remove that (expected 10.1)

Start thinking further into the future. Driver consists of about 20% boiler plate. Write condensed driver spec file and process into .[ch] files at compile time.

Interface dangling pointers -- Gleb

Implement some lightweight ref counting. Need an efficient way to fetch current value gathered from all cpus.

Routing performance

Mutex contention kills performance on routing lookup. Problems when egress interface disappears. Cached rtentry leads to crashes

Look at killing the need for rtentry locking. Ref counting as proposed by Gleb.

Testing / Development

Developers should communicate with companies able to test with different workloads.

Orange (Olivier) and Yandex (Alexander) have interest in routing/forwarding performance. Orange have Ixia test hardware.

Netflix (Lawrence, Adrian, Scott) have access to TCP-heavy production workload.

Netflix looking to host focused mini devsummit events in Los Gatos, CA on a semi-regular basis.

Mbuf refcounting

- cache contention from tightly packed slab

- cpu stall due to external memory read

multi-(process|thread) accept

"Thundering herd" problem.

- SO_REUSEPORT molested as in Linux/DFLYBSD

- Go our own way with e.g. SO_SHAREDQUEUE

Routing subsystem

- Multiple cache misses per radix lookup

- Radix is too generic. Could have a super compact v4 only trie along side

- Luigi/Marko's netmap + DXR work needs to be looked into

- Alexander needs to solve API fro RIB->FIB callbacks to deal with MPLS which can be used to support per-protocol specific lookup structures

Ownership

OFED - get Netapp, Isilon, iX, RDMA NIC vendors to talk

Miscellaneous

Andre is carving ipsec out into a pfil based kmod

Andre is working on the final bits of the TCP-AO

TSO doesn't work with NAT because the copy out buffer is too small and truncates

201309DevSummit/Networking (last edited 2013-09-28 08:00:02 by LawrenceStewart)