test result of D41175
Contents
The testbed used here is from Emulab.net.
Two FreeBSD nodes are connected by a 1Gbps link. Each node has a dual-core CPU running at 3.00GHz. This testbed is CPU bound.
The traffic sender (s1) is using iperf, and the other FreeBSD node (r1) is the traffic receiver.
base test result:
root@s1:~ # sysctl net.inet.siftr.port_filter=5001 net.inet.siftr.port_filter: 0 -> 5001 root@s1:~ # sysctl net.inet.siftr.enabled=1 ; iperf -c r1 -l 1M -t 10 -i 2 ; sysctl net.inet.siftr.enabled=0 ; net.inet.siftr.enabled: 0 -> 1 ------------------------------------------------------------ Client connecting to r1, TCP port 5001 TCP window size: 32.8 KByte (default) ------------------------------------------------------------ [ 1] local 10.1.1.2 port 40725 connected with 10.1.1.3 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-2.00 sec 137 MBytes 575 Mbits/sec [ 1] 2.00-4.00 sec 137 MBytes 575 Mbits/sec [ 1] 4.00-6.00 sec 136 MBytes 570 Mbits/sec [ 1] 6.00-8.00 sec 134 MBytes 562 Mbits/sec [ 1] 8.00-10.00 sec 136 MBytes 570 Mbits/sec [ 1] 0.00-10.03 sec 681 MBytes 570 Mbits/sec net.inet.siftr.enabled: 1 -> 0 root@s1:~ # ll /var/log/siftr.log -rw-r--r-- 1 root wheel 114M Jul 25 07:14 /var/log/siftr.log root@s1:~ # tail -n1 /var/log/siftr.log disable_time_secs=1690290846 disable_time_usecs=487977 num_inbound_tcp_pkts=246745 num_outbound_tcp_pkts=493485 total_tcp_pkts=740230 num_inbound_skipped_pkts_malloc=0 num_outbound_skipped_pkts_malloc=0 num_inbound_skipped_pkts_tcpcb=1 num_outbound_skipped_pkts_tcpcb=1 num_inbound_skipped_pkts_inpcb=0 num_outbound_skipped_pkts_inpcb=0 total_skipped_tcp_pkts=2 flow_list=10.1.1.2;40725-10.1.1.3;5001,
this change result:
root@s1:~ # sysctl net.inet.siftr.port_filter=5001 net.inet.siftr.port_filter: 0 -> 5001 root@s1:~ # sysctl net.inet.siftr.enabled=1; iperf -c r1 -l 1M -t 10 -i 2; sysctl net.inet.siftr.enabled=0; net.inet.siftr.enabled: 0 -> 1 ------------------------------------------------------------ Client connecting to r1, TCP port 5001 TCP window size: 32.8 KByte (default) ------------------------------------------------------------ [ 1] local 10.1.1.2 port 52884 connected with 10.1.1.3 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-2.00 sec 153 MBytes 642 Mbits/sec [ 1] 2.00-4.00 sec 156 MBytes 654 Mbits/sec [ 1] 4.00-6.00 sec 154 MBytes 646 Mbits/sec [ 1] 6.00-8.00 sec 153 MBytes 642 Mbits/sec [ 1] 8.00-10.00 sec 156 MBytes 654 Mbits/sec [ 1] 0.00-10.02 sec 773 MBytes 647 Mbits/sec net.inet.siftr.enabled: 1 -> 0 root@s1:~ # ll /var/log/siftr.log -rw-r--r-- 1 root wheel 129M Jul 25 07:12 /var/log/siftr.log root@s1:~ # tail -n1 /var/log/siftr.log disable_time_secs=1690290766 disable_time_usecs=886030 num_inbound_tcp_pkts=280039 num_outbound_tcp_pkts=560074 total_tcp_pkts=840113 num_inbound_skipped_pkts_malloc=0 num_outbound_skipped_pkts_malloc=0 num_inbound_skipped_pkts_tcpcb=1 num_outbound_skipped_pkts_tcpcb=1 num_inbound_skipped_pkts_inpcb=0 num_outbound_skipped_pkts_inpcb=0 total_skipped_tcp_pkts=2 flow_list=10.1.1.2;52884-10.1.1.3;5001,
Compared with the iperf performance in base, this change brings 13.5% improvement in iperf performance.
top 10 statistics from lock_profile in base vs. compare (sorted on column "wait_total"):
debug.lock.prof.stats: (base) max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 97 3566 82348 2218286 6465 12 343 0 6454 /usr/src/sys/netinet/in_pcb.c:1456 (rw:tcpinp) 3 6378 3793 1979391 5012 0 394 0 3054 /usr/src/sys/kern/kern_alq.c:215 (spin mutex:ALD Queue) << 13 6716 1811 1743252 5012 0 347 0 3428 /usr/src/sys/kern/kern_alq.c:381 (spin mutex:ALD Queue) 53479 14961 2738474 695750 7425 368 93 0 4881 /usr/src/sys/netinet/tcp_usrreq.c:929 (rw:tcpinp) 3977 24680 163310 66119 1080 151 61 0 25 /usr/src/sys/kern/kern_mutex.c:211 (sleep mutex:ALDmtx) 58871 26327 676761 30930 5012 135 6 0 4 /usr/src/sys/kern/kern_alq.c:370 (lockmgr:ufs) 35 5 7371243 4146 748691 9 0 0 4089 /usr/src/sys/kern/kern_alq.c:682 (spin mutex:ALD Queue) 6379 2260 1824880 2607 5012 364 0 0 520 /usr/src/sys/kern/kern_alq.c:222 (sleep mutex:ALDmtx) 24 13 71156 2472 249602 0 0 0 3164 /usr/src/sys/net/netisr.c:1049 (sleep mutex:netisr_mtx) 9 19 4657 1486 13904 0 0 0 2439 /usr/src/sys/net/netisr.c:966 (sleep mutex:netisr_mtx) debug.lock.prof.stats: (compare) max wait_max total wait_total count avg wait_avg cnt_hold cnt_lock name 132 2855 120325 2304321 9126 13 252 0 9114 /usr/src/sys/netinet/in_pcb.c:1456 (rw:tcpinp) 3 14617 1809 1991699 3469 0 574 0 1913 /usr/src/sys/kern/kern_alq.c:381 (spin mutex:ALD Queue) 42180 25694 2868351 1069591 10750 266 99 0 7337 /usr/src/sys/netinet/tcp_usrreq.c:929 (rw:tcpinp) 3 11668 3309 991094 3469 0 285 0 1106 /usr/src/sys/kern/kern_alq.c:215 (spin mutex:ALD Queue) << 43673 21630 940734 144495 3469 271 41 0 22 /usr/src/sys/kern/kern_alq.c:370 (lockmgr:ufs) 3 57978 405 78089 357 1 218 0 308 /usr/src/sys/kern/sched_ule.c:3019 (spin mutex:sched lock 0) 10545 8790 342925 32409 1300 263 24 0 18 /usr/src/sys/kern/kern_mutex.c:211 (sleep mutex:ALDmtx) 5206 11835 18568 11874 22717 0 0 0 7 /usr/src/sys/kern/vfs_bio.c:2920 (rw:bufobj interlock) 50 9682 921 9774 165 5 59 0 2 /usr/src/sys/cam/scsi/scsi_da.c:4534 (sleep mutex:CAM device lock) 65 4817 4057 6706 4155 0 1 0 17 /usr/src/sys/kern/vfs_bio.c:2031 (sleep mutex:bufq dirty lock)
From lock_profile statistics, lock contention on "kern_alq.c:215 (spin mutex:ALD Queue)" is reduced from 1979391 to 991094, a 50% reduction.