test result of D41175


The testbed used here is from Emulab.net.

Two FreeBSD nodes are connected by a 1Gbps link. Each node has a dual-core CPU running at 3.00GHz. This testbed is CPU bound.
The traffic sender (s1) is using iperf, and the other FreeBSD node (r1) is the traffic receiver.

base test result:

root@s1:~ # sysctl net.inet.siftr.port_filter=5001
net.inet.siftr.port_filter: 0 -> 5001
root@s1:~ # sysctl net.inet.siftr.enabled=1 ; iperf -c r1 -l 1M -t 10 -i 2 ; sysctl net.inet.siftr.enabled=0 ;
net.inet.siftr.enabled: 0 -> 1
------------------------------------------------------------
Client connecting to r1, TCP port 5001
TCP window size: 32.8 KByte (default)
------------------------------------------------------------
[  1] local 10.1.1.2 port 40725 connected with 10.1.1.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-2.00 sec   137 MBytes   575 Mbits/sec
[  1] 2.00-4.00 sec   137 MBytes   575 Mbits/sec
[  1] 4.00-6.00 sec   136 MBytes   570 Mbits/sec
[  1] 6.00-8.00 sec   134 MBytes   562 Mbits/sec
[  1] 8.00-10.00 sec   136 MBytes   570 Mbits/sec
[  1] 0.00-10.03 sec   681 MBytes   570 Mbits/sec
net.inet.siftr.enabled: 1 -> 0
root@s1:~ # ll /var/log/siftr.log
-rw-r--r--  1 root wheel  114M Jul 25 07:14 /var/log/siftr.log
root@s1:~ # tail -n1 /var/log/siftr.log
disable_time_secs=1690290846    disable_time_usecs=487977       num_inbound_tcp_pkts=246745     num_outbound_tcp_pkts=493485    total_tcp_pkts=740230   num_inbound_skipped_pkts_malloc=0       num_outbound_skipped_pkts_malloc=0      num_inbound_skipped_pkts_tcpcb=1        num_outbound_skipped_pkts_tcpcb=1       num_inbound_skipped_pkts_inpcb=0        num_outbound_skipped_pkts_inpcb=0       total_skipped_tcp_pkts=2        flow_list=10.1.1.2;40725-10.1.1.3;5001,

this change result:

root@s1:~ # sysctl net.inet.siftr.port_filter=5001
net.inet.siftr.port_filter: 0 -> 5001
root@s1:~ # sysctl net.inet.siftr.enabled=1; iperf -c r1 -l 1M -t 10 -i 2; sysctl net.inet.siftr.enabled=0;
net.inet.siftr.enabled: 0 -> 1
------------------------------------------------------------
Client connecting to r1, TCP port 5001
TCP window size: 32.8 KByte (default)
------------------------------------------------------------
[  1] local 10.1.1.2 port 52884 connected with 10.1.1.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-2.00 sec   153 MBytes   642 Mbits/sec
[  1] 2.00-4.00 sec   156 MBytes   654 Mbits/sec
[  1] 4.00-6.00 sec   154 MBytes   646 Mbits/sec
[  1] 6.00-8.00 sec   153 MBytes   642 Mbits/sec
[  1] 8.00-10.00 sec   156 MBytes   654 Mbits/sec
[  1] 0.00-10.02 sec   773 MBytes   647 Mbits/sec
net.inet.siftr.enabled: 1 -> 0
root@s1:~ # ll /var/log/siftr.log
-rw-r--r--  1 root wheel  129M Jul 25 07:12 /var/log/siftr.log
root@s1:~ # tail -n1 /var/log/siftr.log
disable_time_secs=1690290766    disable_time_usecs=886030       num_inbound_tcp_pkts=280039     num_outbound_tcp_pkts=560074    total_tcp_pkts=840113   num_inbound_skipped_pkts_malloc=0       num_outbound_skipped_pkts_malloc=0      num_inbound_skipped_pkts_tcpcb=1        num_outbound_skipped_pkts_tcpcb=1       num_inbound_skipped_pkts_inpcb=0        num_outbound_skipped_pkts_inpcb=0       total_skipped_tcp_pkts=2        flow_list=10.1.1.2;52884-10.1.1.3;5001,

Compared with the iperf performance in base, this change brings 13.5% improvement in iperf performance.

top 10 statistics from lock_profile in base vs. compare (sorted on column "wait_total"):

debug.lock.prof.stats: (base)
     max  wait_max       total  wait_total       count    avg wait_avg cnt_hold cnt_lock name
      97      3566       82348     2218286        6465     12    343  0   6454 /usr/src/sys/netinet/in_pcb.c:1456 (rw:tcpinp)
       3      6378        3793     1979391        5012      0    394  0   3054 /usr/src/sys/kern/kern_alq.c:215 (spin mutex:ALD Queue)  <<
      13      6716        1811     1743252        5012      0    347  0   3428 /usr/src/sys/kern/kern_alq.c:381 (spin mutex:ALD Queue)
   53479     14961     2738474      695750        7425    368     93  0   4881 /usr/src/sys/netinet/tcp_usrreq.c:929 (rw:tcpinp)
    3977     24680      163310       66119        1080    151     61  0     25 /usr/src/sys/kern/kern_mutex.c:211 (sleep mutex:ALDmtx)
   58871     26327      676761       30930        5012    135      6  0      4 /usr/src/sys/kern/kern_alq.c:370 (lockmgr:ufs)
      35         5     7371243        4146      748691      9      0  0   4089 /usr/src/sys/kern/kern_alq.c:682 (spin mutex:ALD Queue)
    6379      2260     1824880        2607        5012    364      0  0    520 /usr/src/sys/kern/kern_alq.c:222 (sleep mutex:ALDmtx)
      24        13       71156        2472      249602      0      0  0   3164 /usr/src/sys/net/netisr.c:1049 (sleep mutex:netisr_mtx)
       9        19        4657        1486       13904      0      0  0   2439 /usr/src/sys/net/netisr.c:966 (sleep mutex:netisr_mtx)


debug.lock.prof.stats: (compare)
     max  wait_max       total  wait_total       count    avg wait_avg cnt_hold cnt_lock name
     132      2855      120325     2304321        9126     13    252  0   9114 /usr/src/sys/netinet/in_pcb.c:1456 (rw:tcpinp)
       3     14617        1809     1991699        3469      0    574  0   1913 /usr/src/sys/kern/kern_alq.c:381 (spin mutex:ALD Queue)
   42180     25694     2868351     1069591       10750    266     99  0   7337 /usr/src/sys/netinet/tcp_usrreq.c:929 (rw:tcpinp)
       3     11668        3309      991094        3469      0    285  0   1106 /usr/src/sys/kern/kern_alq.c:215 (spin mutex:ALD Queue)  <<
   43673     21630      940734      144495        3469    271     41  0     22 /usr/src/sys/kern/kern_alq.c:370 (lockmgr:ufs)
       3     57978         405       78089         357      1    218  0    308 /usr/src/sys/kern/sched_ule.c:3019 (spin mutex:sched lock 0)
   10545      8790      342925       32409        1300    263     24  0     18 /usr/src/sys/kern/kern_mutex.c:211 (sleep mutex:ALDmtx)
    5206     11835       18568       11874       22717      0      0  0      7 /usr/src/sys/kern/vfs_bio.c:2920 (rw:bufobj interlock)
      50      9682         921        9774         165      5     59  0      2 /usr/src/sys/cam/scsi/scsi_da.c:4534 (sleep mutex:CAM device lock)
      65      4817        4057        6706        4155      0      1  0     17 /usr/src/sys/kern/vfs_bio.c:2031 (sleep mutex:bufq dirty lock)

From lock_profile statistics, lock contention on "kern_alq.c:215 (spin mutex:ALD Queue)" is reduced from 1979391 to 991094, a 50% reduction.

chengcui/testD41175 (last edited 2023-07-25T18:14:31+0000 by chengcui)