There seems to be an issue w/ ssh performance.
JohnMarkGurney worked a bit w/ pkelsey on debugging an issue w/ SSH performance over high latency links. This was started by the thread ssh over WAN: TCP window too small.
Most of the testing was done on FreeBSD HEAD, using r282774 (host) and r284880 (vm). Testing was done w/ a bhyve vm, running dummynet on the vm, and the tap interface on the host bridged w/ the ethernet interface. Without any delay in the link, it can achieve a speed of ~20MB/sec.
Hint: To get TCP buffer information, run netstat -xnfinet.
/etc/ipfw.conf:
pipe 1 config delay 50 add 100 pipe 1 ip from any to any add 65000 allow ip from any to any
Testing w/ nc showed that with a delayed link, the window will grow and get to around 17MB/sec.
In the case of: ssh jmg@192.168.0.21 cat /dev/zero > /dev/null
The buffer will grow to ~160k and a max of 1.6MB/sec.
In the case of: ssh jmg@192.168.0.21 dd of=/dev/null bs=1m < /dev/zero
The buffer will grow to ~80k and a max of 512KB/sec.
Testing w/ OpenSSH 7.1p1 (as both client and server) does no better.
The sysctl kern.ipc.maxsockbuf defaults to 2MB. This needs to be increased if you want you're HIWA to grow larger than 2MB, when you adjust net.inet.tcp.sendbuf_max and/or net.inet.tcp.recvbuf_max.
Benchmarks by AllanJude
Setup:
- 2 Identical servers connected back-to-back on both an igb(4) 1gbps nic (I350), and an ix(4) 10gbps nic (X540-AT2)
- hw.model: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
- hw.ncpu: 16 (2 socket, 8 core CPUs with hyper threading disabled)
- hw.physmem: 137291317248 (128GB)
- AES-NI Enabled
Traffic generation:
# dd if=/dev/zero bs=1m count=20k | ssh <host> dd of=/dev/null bs=1m
- I also tried switching to a 20GB file of zeros compressed with zero-run length encoding compression, so it reads from disk/cache faster than the network:
- # zfs create -o compress=zle zstore/test
- # dd if=/dev/zero of=/zstore/test/zero bs=1m count=20k
# scp <host>:/zstore/test/zero /dev/null
- which gave the same results
20GB via loopback
Base: OpenSSH_6.6.1p1, OpenSSL 1.0.2d-freebsd 9 Jul 2015
Cipher |
mac |
Performance |
aes128-ctr |
umac-64-etm |
5,620 mbps |
aes128-gcm |
implicit |
5,170 mbps |
aes256-ctr |
umac-64-etm |
5,000 mbps |
aes256-gcm |
implicit |
5,050 mbps |
chacha20-poly1305 |
implicit |
1,250 mbps |
none |
umac-64-etm |
6,240 mbps |
Ports: OpenSSH_7.1p1, OpenSSL 1.0.2d 9 Jul 2015
Cipher |
mac |
Performance |
aes128-ctr |
umac-64-etm |
3,500 mbps |
aes128-gcm |
implicit |
4,670 mbps |
aes256-ctr |
umac-64-etm |
3,220 mbps |
aes256-gcm |
implicit |
4,450 mbps |
chacha20-poly1305 |
implicit |
1,240 mbps |
none |
umac-64-etm |
3,530 mbps |
20GB via 10gbps network
Base: OpenSSH_6.6.1p1, OpenSSL 1.0.2d-freebsd 9 Jul 2015
Cipher |
mac |
Performance |
aes128-ctr |
umac-64-etm |
5,235 mbps |
aes128-gcm |
implicit |
5,160 mbps |
aes256-ctr |
umac-64-etm |
4,960 mbps |
aes256-gcm |
implicit |
4,850 mbps |
chacha20-poly1305 |
implicit |
1,260 mbps |
none |
umac-64-etm |
6,275 mbps |
Ports: OpenSSH_7.1p1, OpenSSL 1.0.2d 9 Jul 2015
Cipher |
mac |
Performance |
aes128-ctr |
umac-64-etm |
3,120 mbps |
aes128-gcm |
implicit |
4,460 mbps |
aes256-ctr |
umac-64-etm |
3,050 mbps |
aes256-gcm |
implicit |
4,165 mbps |
chacha20-poly1305 |
implicit |
1,240 mbps |
none |
umac-64-etm |
3,570 mbps |
New Benchmarks by AllanJude
Setup:
- 2 Identical servers connected via an Arista 10gbps switch
- hw.model: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
- hw.ncpu: 12
- hw.physmem: 34210861056
- AES-NI Enabled
60 GiB of data (approximately 1 minute at 8 gbps):
# dd if=/dev/zero bs=128k count=480k | /usr/bin/ssh test@host dd of=/dev/null bs=128k
Benchmarking base OpenSSL (1.0.2j) w/ and w/o AES-NI
The ia32cap variable disables specific instructions:
- bit #33 denoting availability of PCLMULQDQ instruction;
- bit #57 denoting AES-NI instruction set extension;
Benchmarks with AES-NI enabled and disabled in the bios:
Disable Both AES-NI and PCLMULQDQ
# OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc AES-NI ON: 363 MB/s AES-NI OFF: 363 MB/s # OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-gcm AES-NI ON: 260 MB/s AES-NI OFF: 260 MB/s
Disable only AES-NI
# OPENSSL_ia32cap="~0x200000000000000" openssl speed -elapsed -evp aes-128-cbc AES-NI ON: 363 MB/s AES-NI OFF: 363 MB/s # OPENSSL_ia32cap="~0x200000000000000" openssl speed -elapsed -evp aes-128-gcm AES-NI ON: 516 MB/s AES-NI OFF: 516 MB/s
Normal
# openssl speed -elapsed -evp aes-128-cbc AES-NI ON: 768 MB/s AES-NI OFF: 363 MB/s # openssl speed -elapsed -evp aes-128-gcm AES-NI ON: 3315 MB/s AES-NI OFF: 516 MB/s
mercat5/7:
base ssh + base ssl (OpenSSH_7.2p2, OpenSSL 1.0.2j-freebsd 26 Sep 2016):
localhost |
nc |
11,100 mbps |
|
localhost |
aes128-gcm |
8,500 mbps |
|
localhost |
aes128-gcm |
3,100 mbps |
AES-NI disabled in BIOS |
localhost |
aes256-gcm |
7,800 mbps |
|
aes128-gcm
NIC |
base->base |
base->ports NOHPN |
base->ports HPN |
ports NOHPN->base |
ports NOHPN->ports NOHPN |
ports NOHPN->ports HPN |
ports HPN->base |
ports HPN->ports HPN |
ports HPN->ports NOHPN |
localhost |
8,330 mbps |
8,350 mbps |
7,660 mbps |
8,350 mbps |
8,300 mbps |
x |
8,160 mbps |
7,610 mbps |
x |
cxgbe->bxe |
8,650 mbps |
8,860 mbps |
5,370 mbps |
8,560 mbps |
8,700 mbps |
5,370 mbps |
8,560 mbps |
5,340 mbps |
8,680 mbps |
bxe->cxgbe |
5,870 mbps |
5,720 mbps |
3,170 mbps |
5,965 mbps |
5,900 mbps |
3,170 mbps |
5,900 mbps |
3,170 mbps |
5,800 mbps |
ports ssh (NO HPN) + base ssl (OpenSSH_7.3p1, OpenSSL 1.0.2j-freebsd 26 Sep 2016):
localhost |
aes128-gcm |
8,500 mbps |
|
localhost |
aes128-gcm |
3,100 mbps |
AES-NI disabled in BIOS |
localhost |
aes256-gcm |
7,800 mbps |
|
cxgbe->bxe |
aes128-gcm |
8,000 mbps |
|
bxe->cxgbe |
aes128-gcm |
,00 mbps |
seems to be the NIC |
ports ssh (HPN) + base ssl (OpenSSH_7.3p1, OpenSSL 1.0.2j-freebsd 26 Sep 2016):
localhost |
aes128-gcm |
7,400 mbps |
HPN seems to hurt localhost |
localhost |
aes128-gcm |
3,100 mbps |
AES-NI disabled in BIOS |
localhost |
aes256-gcm |
7,000 mbps |
|
cxgbe->bxe HPN->HPN |
aes128-gcm |
5,400 mbps |
??? |
cxgbe->bxe HPN->bNO |
aes128-gcm |
8,600 mbps |
HPN improves sending |
cxgbe->bxe HPN->pNO |
aes128-gcm |
9,000 mbps |
HPN improves sending to not HPN..? |
ports ssh (HPN) + ports ssl (OpenSSH_7.3p1, OpenSSL 1.0.2j 26 Sep 2016):
localhost |
aes128-gcm |
8,600 mbps |
|
localhost |
aes256-gcm |
7,950 mbps |
|