FreeBSD Developer Summit: Transport Working Group

June 7, 2017 (Wednesday), 13:00-16:00

DMS 1140

Overview

We will discuss ongoing work, and ideas for improvements to, the transport protocols in the FreeBSD kernel.

There is a group that meets regularly to discuss transport work in the kernel. Notes of the group's work can be found in TransportProtocols. The face-to-face time will allow us to whiteboard and discuss complex topics in an extended time. It will also allow us to include participants who are not able to make the regular meetings.

If you would like to participate, contact the working group chairs below and CC devsummit@. You will be then added to this page. Please include a list of things you want to talk about or the areas you are interested in. This helps us in planning the session and to bring people together with common interests.

It may be possible to bring in people who cannot attend in person via video conference or chat tools. Notes during the session will be published later on for the whole community to see what we discussed.

Goals

In general, there are two areas we would like to cover:

  1. Discussions of ongoing work that is complex enough, requires coordination, or requires architectural decisions that would benefit from face-to-face discussion among a larger group.
  2. Exchange of ideas for upcoming work to gauge community interest, solicit feedback, look for conflicts/overlap, and generally keep everyone informed.

In particular, we may (or may not) cover the following suggested topics. This is not an exhaustive list and if you feel there is something missing that you want to talk about, contact one of the session chairs and we will include your topic here. Note that the numbering of the topics does not represent an ordering or importance indication of any kind, but rather a reference to the second table with the "topic of interest" column.

The final agenda will be guided by the interest the attendees express, so we may not even talk about any topics listed below if it appears there is little to no interest in the topic among the attendees. Therefore, if you feel strongly that we should discuss a topic, please communicate that to the chairs.

Topics

#

Topic Description

1

RACK, BBR (RandallStewart)

2

Alternate stacks: How do we maintain them? What is the support expectation? How do we minimize code duplication? Etc.

3

Testing alternate TCP stacks: testing multiple module versions (JonathanLooney)

4

Packet Pacing (Netflix learnings)

5

iflib status??

6

Network stack as a module (SteveKiernan)

7

Callout reworking (HansPetterSelasky notes one possible option: hps_head branch)

8

Listen sockets (GlebSmirnoff)

9

RSS update (see also this)?

10

More efficient mbuf design for sendfile and friends (DrewGallatin)

11

Drivers providing backpressure information to higher layers

12

Drivers providing packet timestamps to higher layers

13

TCP troubleshooting

14

LLE callout problem (D4605)

Suggested Agenda

  1. Updates on What People Are Doing
    1. (#1) RACK, BBR (RandallStewart)

    2. (#3) Testing alternate TCP stacks: testing multiple module versions (JonathanLooney)

    3. (#8) Listen sockets (GlebSmirnoff)

    4. (#10) More efficient mbuf design for sendfile and friends (DrewGallatin)

  2. Interesting Information
    1. (#4) Packet Pacing (Netflix learnings) (RandallStewart)

  3. Items to Discuss
    1. (#2) Alternate stacks: How do we maintain them? What is the support expectation? How do we minimize code duplication? Etc.
    2. (#13) TCP troubleshooting
    3. (#11) Drivers providing backpressure information to higher layers
    4. (#12) Drivers providing packet timestamps to higher layers
    5. (#9) RSS work?
    6. (#7) Callout reworking
    7. (#14) LLE callout problem (D4605) (HansPetterSelasky)

Note: General presentations about work you have done that does not require further discussions should be submitted for the FreeBSD Developer Summit track at BSDCan (see the general developer summit page).

Attending

In order to attend you need register for the developer summit as well as by email for the session and be confirmed by the working group organizers. Follow the guidelines described on the main page or what you received by email. For questions or if in doubt ask the session chairs.

Please do NOT add yourself here. Your name will appear automatically once you received the confirmation email. You need to put your name on the general developer summit attendees list though.

#

Name

Username / Affiliation

Topics of Interest

Notes

1

JonathanLooney

jtl@

Session co-chair

2

DrewGallatin

gallatin@

10, 11, 12

3

RandallStewart

rrs@

4

SepherosaZiehau

sephe@

5

MichaelTuexen

tuexen@

6

BjoernZeeb

bz@

7

EricvanGyzen

vangyzen@

8

SteveWills

swills@

9

NavdeepParhar

np@

2, 4, 10

10

Steve Wahl

Dell EMC

11

Jason Eggleston

LimeLight Networks

12

LawrenceStewart

lstewart@

13

RyanStone

rstone@

14

Ming Qiao

Juniper Networks

15

HansPetterSelasky

hselasky@

14

16

MikeKarels

karels@

17

MikeSilbersack

silby@

2, 6

18

Siva Mahadevan

FreeBSD Foundation/emaste@

19

Charlie Yang

FreeBSD Foundation/emaste@

20

DavidSomayajulu

davidcs@

21

Kevin Bowling

LimeLight Networks

Results

We kick off with a discussion of BBR and Rack where rrs@ explains what these are and why we have them. A presentation on BBR can be found beneath the link. kbowling brings up small objects and says that LimeLight have a patch for issues there. rrs@ says he tried that but that he did not find it to be helpful. The issue seems to come down to datacenter workloads. rrs@ says that it's all about pacing and that the congestion window is not as much the issue.

jtl@ brings up the idea of pushing the base stack into its own module so that with module versioning we can upgrade stacks without a reboot. This is something that we hope to get to soon.

The listen sockets discussion doesn't really go anywhere as glebius is not present, but there are rumors that the current rework is ready to go into the tree.

gallatin@ explains a recent rework of the mbuf code to handle large file sends via sendfile. The idea is to have the mbuf reference a lot more data than we can now and to free all those pages when the single mbuf is freed. The code is not quite ready to commit but is getting closer to review.

rstone@ has changes for low RTT networks which need to use a new option but are otherwise ready to go in. This will also be an IETF draft written with other folks who do IETFy things.

The NF folks have been looking at hardware vs. software pacing and have found, that for their workload, that they do not get benefits in CPU utilization but do get QOE (Quality of Experience) on the client side.

The room discusses the direction in which we take the TCP stack code. There is general agreement, with qualifications from rrs@, about trying to re-unify the the TCP code. A discussion then ensues on whether or not the coalesced stack would coalesce around a Rack based stack with changes to address connections that are non-Rack. We'd like to have Rack be default in HEAD but not in STABLE to get more testing. Michael asks about how well we can pull the pacer our of the Rack/BBR code and use it generally. rrs@ thinks that is already done. rrs@ says there are 3 steps to getting this in

  1. Get in the Black Box
  2. Get in the Pacer which no one uses until Rack/BBR is in.
  3. Place Rack and BBR into the tree.

rrs@ and jtl@ give a black box demo

jtl@ will make a Phabricator review for the black box code

We then move into a discussion of driver back pressure. hps@ has a mechanism in the ConnectX drivers to do this. The tag mechanism goes from the TCP layer down through the driver. hps@, npn@, lstewart@, gallatin@ to take this on.

gnn@ to look at the packet timestamps


CategoryHistorical

DevSummit/201706/Transport (last edited 2018-03-18 18:12:35 by MarkLinimon)