compare FreeBSD and Linux TCP Congestion Control algorithms over emulated 1Gbps x 40ms WAN


Emulab environment

The testbed, configured in a dumbbell topology (pc3000 nodes with 1Gbps links), is sourced from Emulab.net: Emulab hardware.

The topology consists of a dumbbell structure with two nodes (s1, s2) acting as TCP traffic senders using iperf3, two nodes (rt1, rt2) functioning as routers with a single bottleneck link, and two nodes (r1, r2) serving as TCP traffic receivers.

The sender nodes (s1, s2) are used to test TCP congestion control algorithms on different operating systems, such as FreeBSD and Ubuntu Linux. The receiver nodes (r1, r2) run Ubuntu Linux 22.04.

The router nodes (rt1, rt2) operate on Ubuntu Linux 18.04 with shallow TX/RX ring buffers configured to 128 descriptors, and the L3 buffer set to 128 packets, resulting in a total routing buffer of approximately 256 packets.

A Dummynet box introduces a 40ms round-trip delay (RTT) on the bottleneck link. All senders transmit data traffic to their corresponding receivers (e.g., s1 => r1 and s2 => r2) at the same time.

The bottleneck link has a 1Gbps bandwidth, and the TCP traffic from the two senders will experience congestion at the rt1 node's output port.

testbed: attachment:DummbellTopology.png

test config

root@s1:~ # ping -c 5 r1
PING r1-link5 (10.1.4.3): 56 data bytes
64 bytes from 10.1.4.3: icmp_seq=0 ttl=62 time=40.313 ms
64 bytes from 10.1.4.3: icmp_seq=1 ttl=62 time=40.186 ms
64 bytes from 10.1.4.3: icmp_seq=2 ttl=62 time=40.233 ms
64 bytes from 10.1.4.3: icmp_seq=3 ttl=62 time=40.150 ms
64 bytes from 10.1.4.3: icmp_seq=4 ttl=62 time=40.250 ms

--- r1-link5 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 40.150/40.227/40.313/0.056 ms
root@s1:~ #

root@s2:~ # ping -c 5 r2
PING r2-link6 (10.1.1.3): 56 data bytes
64 bytes from 10.1.1.3: icmp_seq=0 ttl=62 time=40.228 ms
64 bytes from 10.1.1.3: icmp_seq=1 ttl=62 time=40.249 ms
64 bytes from 10.1.1.3: icmp_seq=2 ttl=62 time=40.094 ms
64 bytes from 10.1.1.3: icmp_seq=3 ttl=62 time=40.571 ms
64 bytes from 10.1.1.3: icmp_seq=4 ttl=62 time=40.188 ms

--- r2-link6 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 40.094/40.266/40.571/0.161 ms
root@s2:~ #

root@s1:~ # cat /etc/sysctl.conf
...
net.inet.tcp.hostcache.enable=0
kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

root@r1:~# cat /etc/sysctl.conf 
...
# allow testing with 256MB buffers
net.core.rmem_max = 268435456 
net.core.wmem_max = 268435456 
# increase Linux autotuning TCP buffer limit to 256MB
net.ipv4.tcp_rmem = 4096 131072 268435456
net.ipv4.tcp_wmem = 4096 16384 268435456
# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1

root@rt1:~# /sbin/ethtool -g eth5
Ring parameters for eth5:
Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             128
RX Mini:        0
RX Jumbo:       0
TX:             128

root@rt1:~# 
root@rt1:~# ifconfig eth5
eth5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.1.3.2  netmask 255.255.255.0  broadcast 10.1.3.255
        inet6 fe80::204:23ff:feb7:17c9  prefixlen 64  scopeid 0x20<link>
        ether 00:04:23:b7:17:c9  txqueuelen 128  (Ethernet)
        RX packets 186  bytes 18000 (18.0 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 196  bytes 19354 (19.3 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@rt1:~#
root@rt1:~# /sbin/tc qdisc show dev eth5
qdisc pfifo 8005: root refcnt 2 limit 128p
root@rt1:~#

root@s1:~ # kldstat
Id Refs Address                Size Name
 1   14 0xffffffff80200000  23a57e8 kernel
 2    1 0xffffffff83200000   3cb1f8 zfs.ko
 3    1 0xffffffff83020000    31e48 tcp_rack.ko
 4    1 0xffffffff83052000     e0f0 tcphpts.ko
root@s1:~ #

root@s1:~ # sysctl net.inet.tcp.functions_default=rack
net.inet.tcp.functions_default: freebsd -> rack
root@s1:~ #

senders' kernel info

FreeBSD 15.0-CURRENT #0 c6767dc1f236: Thu Jan 30 05:48:17 MST 2025

routers' kernel info

Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-117-generic x86_64)

receivers' kernel info

Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-122-generic x86_64)

iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -l 1M -t 200 -i 1 -f m -VC ${name}

test result

TCP congestion control algo

average link utilization over the 200s test (1Gbps bottleneck)

peer compare

Linux CUBIC

846 Mbits/sec (2nd iter)

base

FreeBSD default stack CUBIC

470 Mbits/sec (2nd iter)

-44.4%

FreeBSD RACK stack CUBIC

505 Mbits/sec (2nd iter)

-40.3%

Linux newreno

695 Mbits/sec (1st iter)

base

FreeBSD default stack newreno

301 Mbits/sec (1st iter)

-56.7%

FreeBSD RACK stack newreno

444 Mbits/sec (3rd iter)

-36.1%

TCP throughput: attachment:dumbbell_linux_cubic_throughput_chart.png TCP congestion window: attachment:dumbbell_linux_cubic_cwnd_chart.png

TCP throughput: attachment:dumbbell_fbsd_default_cubic_throughput_chart.png TCP congestion window: attachment:dumbbell_fbsd_default_cubic_cwnd_chart.png

TCP throughput: attachment:dumbbell_fbsd_rack_cubic_throughput_chart.png TCP congestion window: attachment:dumbbell_fbsd_rack_cubic_cwnd_chart.png

TCP throughput: attachment:dumbbell_linux_newreno_throughput_chart.png TCP congestion window: attachment:dumbbell_linux_newreno_cwnd_chart.png

TCP throughput: attachment:dumbbell_fbsd_default_newreno_throughput_chart.png TCP congestion window: attachment:dumbbell_fbsd_default_newreno_cwnd_chart.png

TCP throughput: attachment:dumbbell_fbsd_rack_newreno_throughput_chart.png TCP congestion window: attachment:dumbbell_fbsd_rack_newreno_cwnd_chart.png

VM environment

test config

Virtual machines (VMs) are hosted by Bhyve in two separate physical boxs (Beelink SER5 AMD Mini PC) that are using FreeBSD 14.1 release OS. The two physical boxes are connected through a 1Gbps hub.

In each test, only one data sender and one data receiver are used, both are Virtual Machines (VMs). FreeBSD VM n1fbsd , and then Linux VM n1linuxvm  is used to send TCP data traffic through the same physical link to the Linux VM receiver n2linuxvm . 40ms delay is added at the Linux receiver.

There is occasional TCP packet drops and we can evaluate congestion control performance. The minimum bandwidth delay product (BDP) is 1000Mbps x 40ms == 5 Mbytes.

root@n2linuxvm:~ # tc qdisc add dev enp0s5 root netem delay 40ms
root@n2linuxvm:~ # tc qdisc show dev enp0s5
qdisc netem 8001: root refcnt 2 limit 1000 delay 40ms
root@n2linuxvm:~ #

root@n1fbsd:~ # ping -c 5 -S 192.168.50.37 192.168.50.89
PING 192.168.50.89 (192.168.50.89) from 192.168.50.37: 56 data bytes
64 bytes from 192.168.50.89: icmp_seq=0 ttl=64 time=44.003 ms
64 bytes from 192.168.50.89: icmp_seq=1 ttl=64 time=44.837 ms
64 bytes from 192.168.50.89: icmp_seq=2 ttl=64 time=43.978 ms
64 bytes from 192.168.50.89: icmp_seq=3 ttl=64 time=43.513 ms
64 bytes from 192.168.50.89: icmp_seq=4 ttl=64 time=43.631 ms

--- 192.168.50.89 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 43.513/43.993/44.837/0.463 ms
root@n1fbsd:~ #

root@n1linuxvm:~ # ping -c 5 -I 192.168.50.154 192.168.50.89
PING 192.168.50.89 (192.168.50.89) from 192.168.50.154 : 56(84) bytes of data.
64 bytes from 192.168.50.89: icmp_seq=1 ttl=64 time=43.9 ms
64 bytes from 192.168.50.89: icmp_seq=2 ttl=64 time=44.0 ms
64 bytes from 192.168.50.89: icmp_seq=3 ttl=64 time=43.7 ms
64 bytes from 192.168.50.89: icmp_seq=4 ttl=64 time=44.0 ms
64 bytes from 192.168.50.89: icmp_seq=5 ttl=64 time=43.7 ms

--- 192.168.50.89 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4031ms
rtt min/avg/max/mdev = 43.706/43.862/44.015/0.130 ms
root@n1linuxvm:~ # 

root@n1fbsd:~ # cat /etc/sysctl.conf
...
# Don't cache ssthresh from previous connection
net.inet.tcp.hostcache.enable=0
# In crease FreeBSD maximum socket buffer size up to 128MB
kern.ipc.maxsockbuf=134217728
# Increase FreeBSD Max size of automatic send/receive buffer up to 128MB
net.inet.tcp.sendbuf_max=134217728
net.inet.tcp.recvbuf_max=134217728
root@n1fbsd:~ #

root@n2linuxvm:~ # cat /etc/sysctl.conf
...
net.core.rmem_max = 134217728 
net.core.wmem_max = 134217728 
# Increase Linux autotuning TCP buffer max up to 128MB buffers
net.ipv4.tcp_rmem = 4096 131072 134217728
net.ipv4.tcp_wmem = 4096 16384 134217728
# Don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
root@n2linuxvm:~ #

root@n1fbsd:~ # kldstat
Id Refs Address                Size Name
 1    5 0xffffffff80200000  1f75ca0 kernel
 2    1 0xffffffff82810000    368d8 tcp_rack.ko
 3    1 0xffffffff82847000     f0f0 tcphpts.ko
root@n1fbsd:~ # sysctl net.inet.tcp.functions_default
net.inet.tcp.functions_default: rack
root@n1fbsd:~ #

sender info for FreeBSD

FreeBSD 15.0-CURRENT (GENERIC) #0 main-n273771-e8263ace39c8

sender info for Linux

Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-124-generic x86_64)

receiver info for Linux

Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-124-generic x86_64)

iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -l 1M -t 100 -i 1 -f m -VC ${name}

test result

kern.hz value

TCP congestion control algo

iperf3 100 seconds average Bitrate

100 (default)

FreeBSD default stack CUBIC

455 Mbits/sec (-46.9%)

FreeBSD default stack newreno

497 Mbits/sec (-37.4%)

FreeBSD RACK stack CUBIC

682 Mbits/sec (-20.4%)

FreeBSD RACK stack newreno

442 Mbits/sec (-44.3%)

250, but irrelevant

Linux CUBIC

857 Mbits/sec (base)

Linux newreno

794 Mbits/sec (base)

TCP throughput: attachment:throughput_chart_freebsd.png TCP congestion window: attachment:cwnd_chart_freebsd.png

TCP throughput: attachment:throughput_chart_freebsd_RACK2.png TCP congestion window: attachment:cwnd_chart_freebsd_RACK2.png

TCP throughput: attachment:throughput_chart_linux.png TCP congestion window: attachment:cwnd_chart_linux.png

references

rejected review patch: Align cubic cc with rfc9438

chengcui/problems/compare_fbsd_linux_tcp_cc (last edited 2025-02-06T15:03:00+0000 by chengcui)