NetFPGA SUME reference NIC device driver
Student: Denis Salopek <denis DOT sale AT gmail DOT com>
Mentors: Marko Zec, Bjoern A. Zeeb <zec AT fer DOThr>, <bz AT freebsd DOT org>
Project description
This project enables NetFPGA SUME 4x10 Gbps FPGA board to work as a NIC on FreeBSD by creating a driver based on the existing Linux driver for the 'Reference NIC' design using the RIFFA DMA engine from the private NetFPGA-SUME-live repository.
The SUME hardware design also offers communication with internal registers belonging to different modules of the design over a second channel. This is used as a way to obtain packet statistics and link status from nf_10g_interface modules connected to the board's SFP+ modules.
Original GSoC proposal: https://summerofcode.withgoogle.com/projects/#4932584418574336
Preparing the NetFPGA SUME
To flash the board with the 'Reference NIC' project, one must first obtain the reference_nic bitstream. The build instructions are available on the NetFPGA SUME Wiki page, but it is also possible to download the pre-built bitstream from the University of Cambridge.
NOTE: on my setup, the pre-built bitstream wasn't working correctly - at higher incoming rates, the newly incoming packets would overwrite the ones in some internal FIFO so the packets would come out of the board scrambled until physical reset of the board. If you want to skip building the bitstream and don't want to use the one from Cambridge, I'll soon provide the one I built myself.
For the next step I used Linux, as the appropriate JTAG programming tools are not available for FreeBSD. It is also possible to flash the board using the Vivado's xmd tool, as explained in the Reference NIC wiki page.
Install Digilent Adept Tools (Runtime and Utilities) from Digilent, connect your machine with the NetFPGA via USB and flash the board with:
# dsumecfg -d NetSUME write -verbose -s 2 -f reference_nic.bit # flash to flash section 2 # dsumecfg -d NetSUME setbootsec -s 2 # load flash section 2 on board boot-up # dsumecfg -d NetSUME reconfig -s 2 # reconfigure the board from section 2
Instructions to load the driver
After downloading the code, run:
# make # kldload ./if_sume.ko
The driver should load and create 4 interfaces (named sume0-sume3).
The code
FreeBSD SUME repo: https://github.com/denisSal/freebsd-sume
FreeBSD driver has some advantages over Linux version:
- more balanced TCP throughput (still open to speculation why but probably due to TX queuing)
- link state detection / reporting
- access to hardware counters via sysctl
- watchdog function to reset the HW if it gets stuck in TX state.
Benchmarks
Using the slow RIFFA DMA engine with multiple interrupts and one packet over one DMA transaction yields low performance for the NetFPGA SUME reference NIC. Benchmarks are done using netmap tool pkt-gen (UDP sending / receiving 60 B or 1500 B packets), netperf (sending / receiving TCP with default options) and iperf (sending / receiving TCP - using option -l to change buffer size, otherwise getting inconsistent results) on Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz (Ubuntu 18.04.4 with kernel 4.15.0-112-generic, FreeBSD 11.3-STABLE and FreeBSD 12.1-RELEASE-p7). 'TX only' and 'RX only' benchmarks are done by connecting one NetFPGA SUME port to a different computer (FreeBSD 12.1) to an Intel X520 NIC (ix) and 'loopback' is performed by connecting two ports of the same NetFPGA SUME, using one for TX and other for RX.
LINUX |
Test |
TX only |
RX only |
loopback |
|
iperf |
249 Mb/s (-l 25) |
2.52 Gb/s |
- |
|
netperf |
77.34 Mb/s |
2.51 Gb/s |
- |
|
pkt-gen - 60 B |
314 Kpps (150 Mb/s) |
242 Kpps (116 Mb/s) |
200 Kpps |
|
pkt-gen - 1500 B |
215 Kpps (2.6 Gb/s) |
234 Kpps (2.8 Gb/s) |
180 Kpps |
FREEBSD 11.3 |
|
|
|
|
|
iperf |
2.15 Gb/s |
606 Mb/s (-l 162) |
- |
|
netperf |
2.15 Gb/s |
397 Mb/s |
- |
|
pkt-gen - 60 B |
303 Kpps (145 Mb/s) |
246 Kpps (118 Mb/s) |
152 Kpps |
|
pkt-gen - 1500 B |
207 Kpps (2.5 Gb/s) |
235 Kpps (2.9 Gb/s) |
138 Kpps |
FREEBSD 11.3 |
|
|
|
|
rx_queue branch |
iperf |
2.12 Gb/s |
448 Mb/s (-l 120) |
- |
|
netperf |
2.12 Gb/s |
- |
- |
|
pkt-gen - 60 B |
301 Kpps (144 Mb/s) |
238 Kpps (114 Mb/s) |
151 Kpps |
|
pkt-gen - 1500 B |
208 Kpps (2.5 Gb/s) |
237 Kpps (2.9 Gb/s) |
125 Kpps |
FREEBSD 12.1-p7 |
|
|
|
|
|
iperf |
2.2 Gb/s |
610 Mb/s (-l 163) |
- |
|
netperf |
2.2 Gb/s |
201 Mb/s |
- |
|
pkt-gen - 60 B |
279 Kpps (134 Mb/s) |
220 Kpps (111 Mb/s) |
164 Kpps |
|
pkt-gen - 1500 B |
260 Kpps (3.1 Gb/s) |
215 Kpps (2.7 Gb/s) |
133 Kpps |
FREEBSD 12.1-p7 |
|
|
|
|
rx_queue branch |
iperf |
2.22 Gb/s |
607 Mb/s |
- |
|
netperf |
2.22 Gb/s |
521 Mb/s |
- |
|
pkt-gen - 60 B |
278 Kpps (133 Mb/s) |
238 Kpps (114 Mb/s) |
125 Kpps |
|
pkt-gen - 1500 B |
261 Kpps (3.1 Gb/s) |
238 Kpps (2.9 Gb/s) |
150 Kpps |
NOTE: The 'Reference NIC' design from the private NetFPGA-SUME-live repository works as a 4 port NIC capable of communicating with the host device by using a RIFFA-based DMA engine. The RX/TX transactions between the board and the host are done over one DMA channel and there is no packet batching, so packets are transfered on a one-to-one basis, with the possibility of taking 2-3 interrupts for one transaction. Due to this, the design is unable to achieve line-rate.
Bugs
I have noticed a strange behaviour in the TX path of the reference NIC design: when sending "small" packets using pkt-gen, TX works as it should, but when sending packets above some size limit (above 1072 bytes on my computer) right after loading the driver, TX breaks and I cannot send any more packets. SUME doesn't send the TX_DONE interrupt and refuses to send packets until I reload the driver / reset the board. This would mean that the driver would get stuck in a non-IDLE TX state but a watchdog-like function is implemented in the driver to check whether the non-IDLE state lasts more then 3 seconds and it resets the board automatically.
The same thing happens on Linux, too (with a different size limit). If I "warm-up" the modules by sending some slightly smaller packets beforehand, the TX doesn't get stuck even if I send larger packets. Without the watchdog, this would happen:
- load the module and bring up one SUME ifc
- run pkt-gen to generate 1073 B (or larger) packets from SUME
- some of packets are sent normally, than after less then a second, SUME gets stuck in a TX state and doesn't send the 'TX done' interrupt
- stop pkt-gen, reload the module and bring up the same ifc
- run pkt-gen to generate 1072 B packets from SUME
- packets are sent normally
- stop pkt-gen, run pkt-gen to generate 1073 B (or larger) packets from SUME
- packets are sent normally