NetFPGA SUME reference NIC device driver

Project description

This project enables NetFPGA SUME 4x10 Gbps FPGA board to work as a NIC on FreeBSD by creating a driver based on the existing Linux driver for the 'Reference NIC' design using the RIFFA DMA engine from the private NetFPGA-SUME-live repository.

The SUME hardware design also offers communication with internal registers belonging to different modules of the design over a second channel. This is used as a way to obtain packet statistics and link status from nf_10g_interface modules connected to the board's SFP+ modules.

Original GSoC proposal: https://summerofcode.withgoogle.com/projects/#4932584418574336

Preparing the NetFPGA SUME

To flash the board with the 'Reference NIC' project, one must first obtain the reference_nic bitstream. The build instructions are available on the NetFPGA SUME Wiki page, but it is also possible to download the pre-built bitstream from the University of Cambridge.

NOTE: on my setup, the pre-built bitstream wasn't working correctly - at higher incoming rates, the newly incoming packets would overwrite the ones in some internal FIFO so the packets would come out of the board scrambled until physical reset of the board. If you want to skip building the bitstream and don't want to use the one from Cambridge, I'll soon provide the one I built myself.

For the next step I used Linux, as the appropriate JTAG programming tools are not available for FreeBSD. It is also possible to flash the board using the Vivado's xmd tool, as explained in the Reference NIC wiki page.

Install Digilent Adept Tools (Runtime and Utilities) from Digilent, connect your machine with the NetFPGA via USB and flash the board with:

 # dsumecfg -d NetSUME write -verbose -s 2 -f reference_nic.bit # flash to flash section 2
 # dsumecfg -d NetSUME setbootsec -s 2 # load flash section 2 on board boot-up
 # dsumecfg -d NetSUME reconfig -s 2 # reconfigure the board from section 2

Instructions to load the driver

After downloading the code, run:

 # make
 # kldload ./if_sume.ko

The driver should load and create 4 interfaces (named sume0-sume3).

The code

FreeBSD SUME repo: https://github.com/denisSal/freebsd-sume

FreeBSD driver has some advantages over Linux version:

Benchmarks

Using the slow RIFFA DMA engine with multiple interrupts and one packet over one DMA transaction yields low performance for the NetFPGA SUME reference NIC. Benchmarks are done using netmap tool pkt-gen (UDP sending / receiving 60 B or 1500 B packets), netperf (sending / receiving TCP with default options) and iperf (sending / receiving TCP - using option -l to change buffer size, otherwise getting inconsistent results) on Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz (Ubuntu 18.04.4 with kernel 4.15.0-112-generic, FreeBSD 11.3-STABLE and FreeBSD 12.1-RELEASE-p7). 'TX only' and 'RX only' benchmarks are done by connecting one NetFPGA SUME port to a different computer (FreeBSD 12.1) to an Intel X520 NIC (ix) and 'loopback' is performed by connecting two ports of the same NetFPGA SUME, using one for TX and other for RX.

LINUX

Test

TX only

RX only

loopback

iperf

249 Mb/s (-l 25)

2.52 Gb/s

-

netperf

77.34 Mb/s

2.51 Gb/s

-

pkt-gen - 60 B

314 Kpps (150 Mb/s)

242 Kpps (116 Mb/s)

200 Kpps

pkt-gen - 1500 B

215 Kpps (2.6 Gb/s)

234 Kpps (2.8 Gb/s)

180 Kpps

FREEBSD 11.3

iperf

2.15 Gb/s

606 Mb/s (-l 162)

-

netperf

2.15 Gb/s

397 Mb/s

-

pkt-gen - 60 B

303 Kpps (145 Mb/s)

246 Kpps (118 Mb/s)

152 Kpps

pkt-gen - 1500 B

207 Kpps (2.5 Gb/s)

235 Kpps (2.9 Gb/s)

138 Kpps

FREEBSD 11.3

rx_queue branch

iperf

2.12 Gb/s

448 Mb/s (-l 120)

-

netperf

2.12 Gb/s

-

-

pkt-gen - 60 B

301 Kpps (144 Mb/s)

238 Kpps (114 Mb/s)

151 Kpps

pkt-gen - 1500 B

208 Kpps (2.5 Gb/s)

237 Kpps (2.9 Gb/s)

125 Kpps

FREEBSD 12.1-p7

iperf

2.2 Gb/s

610 Mb/s (-l 163)

-

netperf

2.2 Gb/s

201 Mb/s

-

pkt-gen - 60 B

279 Kpps (134 Mb/s)

220 Kpps (111 Mb/s)

164 Kpps

pkt-gen - 1500 B

260 Kpps (3.1 Gb/s)

215 Kpps (2.7 Gb/s)

133 Kpps

FREEBSD 12.1-p7

rx_queue branch

iperf

2.22 Gb/s

607 Mb/s

-

netperf

2.22 Gb/s

521 Mb/s

-

pkt-gen - 60 B

278 Kpps (133 Mb/s)

238 Kpps (114 Mb/s)

125 Kpps

pkt-gen - 1500 B

261 Kpps (3.1 Gb/s)

238 Kpps (2.9 Gb/s)

150 Kpps

NOTE: The 'Reference NIC' design from the private NetFPGA-SUME-live repository works as a 4 port NIC capable of communicating with the host device by using a RIFFA-based DMA engine. The RX/TX transactions between the board and the host are done over one DMA channel and there is no packet batching, so packets are transfered on a one-to-one basis, with the possibility of taking 2-3 interrupts for one transaction. Due to this, the design is unable to achieve line-rate.

Bugs

I have noticed a strange behaviour in the TX path of the reference NIC design: when sending "small" packets using pkt-gen, TX works as it should, but when sending packets above some size limit (above 1072 bytes on my computer) right after loading the driver, TX breaks and I cannot send any more packets. SUME doesn't send the TX_DONE interrupt and refuses to send packets until I reload the driver / reset the board. This would mean that the driver would get stuck in a non-IDLE TX state but a watchdog-like function is implemented in the driver to check whether the non-IDLE state lasts more then 3 seconds and it resets the board automatically.

The same thing happens on Linux, too (with a different size limit). If I "warm-up" the modules by sending some slightly smaller packets beforehand, the TX doesn't get stuck even if I send larger packets. Without the watchdog, this would happen:


CategoryGsoc

SummerOfCode2020Projects/NetFPGA_SUME_Driver (last edited 2021-04-26T05:27:50+0000 by JethroNederhof)