Test setup:
- The test consisted in running the netperf tcp stream test through the wrap board, to see how much traffic it can sustain/forward/nat with different software setup.
- The network was setup to make all the requests and replies pass through the wrap board, while netserver and netperf where running on the desktop.
- Socket sizes: -s = -S = 57344.
- Desktop: PentiumD 920 (2x2.8Ghz), 2Gb RAM - FreeBSD-HEAD/amd64.
Wrap: AMD Geode (486 clone 233Mhz), 128MB RAM, 10/100 sis nics (DP83816 PCI) details here.
- World and kernel for the wrap board were compiled with -O1 (except in the gcc optimization tests).
- In the 7.x tests, ipfw was compiled as a module and it was not loaded during the simple forwarding tests.
In the 4.x tests, i used 2 different kernels: one with IPFW & DIVERT compiled in (used only in the nat tests), and another without IPFW compiled in (used for all the other tests) - this way i avoided ipfw overhead in the forwarding tests.
A tarball containing test results, gnuplot files and plots is available here.
- Below the images are my comments, feel free to add yours.
Author: Paolo Pisati <piso@FreeBSD.org>
Legend:
- 4.x: FreeBSD 4.11/i386
- 7.x: FreeBSD HEAD/i386 (30/12/2006)
- fwd: net.inet.ip.forwarding=1
- ffwd: net.inet.ip.fastforwarding=1
- poll-1x: device polling activated for sis0
- poll-2x: device polling activated for sis0 and sis1
- nod: net.isr.direct=0 (7.x default: net.isr.direct=1)
- natd: ipfw divert socket + natd
- ipfwnat: ipfw nat
- O0: -O0 (no optimization)
- O1: -O1
- O2: -O2 -fno-strict-aliasing
- Os: -Os (implies -fno-strict-aliasing)
- In the 4.x-ffwd case, looks like the 100Mbit nics are the bottleneck.
- Dunno what's the cause for the drop in 4.x-fwd-poll2x.
- In all the tests without polling enabled, the wrap board was unresponsive (livelock?!?!).
- There's something definitely wrong with polling in 7.x (maybe since 5.x?).
- Device polling in 4.x gives a remarkable improvement.
- All the 4.x plots are overlayed on top of each other.
- In 4.x, turning on polling with fast forward enabled (contrary to what happened in the previous experiment), didn't give any performance boost: perhaps with Gb nics we would see an improvement?
Merge of the 2 previous experiments: really messy, but useful to directly compare different combinations.
- As per rwatson request, i did this test (more from him? :).
- In 4.x the firewall (ipfw) and fast forwarding are not compatible, while in 7.x (in the fast forward path) we traverse the pfil hooks so i turned it on in that case.
- Turning on polling didn't improve the 4.x cases (while it improved simple forwarding - see above).
- In any case, 7.x ipfw nat looks faster than 4.x natd.
- Being the curious person i am, i wanted to see how much impact different gcc optimization levels can have on packets forwarding, and i was gladly surprised to see how the situation improved rising the optimization level - definitely worth a try imo, at least the -O1 level.
Conclusion:
- 4.x (at least on slow/UP/this hw) is almost always the fastest solution.
- Turn on fast forwarding whenever possible.
- Device polling looks broken in HEAD (what about 5.x? 6.x?).
- 7.x ipfw nat is faster than 4.x natd.
- Rising compiler optimization level can significantly affect the performance.