17 November 2016 Hangout Notes
- The pacing work is still in process. Drew has been giving folks feedback on this. It has not crashed in the last 24 hours. When it works it works very well. It does increase the amount of CPU utilized. Mainly the work is around stability. The code does not really tell the stack whether or not the connection was successfully offloaded. The code is being tested with RACK and !RACK.
- The 1MSS change has been in HEAD for two weeks. The only reported issue (by dim@) seems to be related to a change made by glebius@ last year. This is being actively worked on by hiren@, jtl@ and glebius@. Once the issue is understood then the code can be MFC'd to 11. The change is currently deployed on the FreeBSD cluster.
- gnn@ asks about seeing multiple failures. tuexen@ says that this was due to a bug in packetdrill cleanup, which is now addressed in the github repo. A lot of tests were failing within bhyve due to the way that the timers are handled in bhyve vs. VMware and real hardware.
- jegg@ says that they are seeing odd ECR issues (ECR is 0 but a normal timestamp) which exposes a number of issues in the stack, such as sending a RST, which we should, but we don't close the socket. If the RST gets dropped then the sender and receive continue to communicate. Not going to closed after sending RST is a bug which hiren@ will fix. In order to test this with packetdrill jegg@ will send a trace and a proposed patch to tuexen@.
- rstone@ they are seeing an issue where a long tail of packets are dropped, requiring the use of the RTO to recover which takes a long time. This is a result of the Linux (Centos) dropping the packets after its own version of tcp_input(). tuexen@ will try to write up a packetdrill script.