We encouter issues with Ethernet performance especially since last update of our TX2 to jetpack 4.4.1.
Our network is composed by a fleet of tx2 boards.
In order to find error, we have made a test with the TX2 on its evaluation board and a USB to Gigabit Ethernet adapter plugged on a laptop (to connect it to the TX2).
We’ve seen strange behavior on our network. so we began to investigate.
To emphase the issue, we are using iperf tool.
To validate our tests we’ve made tests with many TX2.
First of all, we’ve plugged our computer with the same adapter to a Linux computer. All our tests are ok We are really near the Gigabit theorical limitation (~930Mb/sec).
On the TX2, we use the nvidia graphic tool to configure the TX2 from scratch.
After we finalize TX2 installation with these commands:
sudo apt update
sudo apt upgrade
sudo apt autoremove
We’ve updated network configuration setting the file etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
allow-hotplug eth0
iface eth0 inet static
address 172.16.150.1
netmask 255.255.0.0
And we we’re able to run iperf tests.
The tx2 hosts the iperf server with the command:
> iperf -s -p 10000
and we can start the client on the laptop
> iperf -c 172.16.150.1 -p 10000 -t 20 -i 1 -r
Here here our results:
------------------------------------------------------------
Server listening on TCP port 10000
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.16.150.1, TCP port 10000
TCP window size: 512 KByte (default)
------------------------------------------------------------
[ 4] local 172.16.100.100 port 56425 connected with 172.16.150.1 port 10000
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 1.0 sec 110 MBytes 922 Mbits/sec
[ 4] 1.0- 2.0 sec 111 MBytes 930 Mbits/sec
[ 4] 2.0- 3.0 sec 110 MBytes 923 Mbits/sec
[ 4] 3.0- 4.0 sec 110 MBytes 923 Mbits/sec
[ 4] 4.0- 5.0 sec 110 MBytes 924 Mbits/sec
[ 4] 5.0- 6.0 sec 111 MBytes 930 Mbits/sec
[ 4] 6.0- 7.0 sec 111 MBytes 928 Mbits/sec
[ 4] 7.0- 8.0 sec 110 MBytes 924 Mbits/sec
[ 4] 8.0- 9.0 sec 110 MBytes 922 Mbits/sec
[ 4] 9.0-10.0 sec 111 MBytes 928 Mbits/sec
[ 4] 10.0-11.0 sec 110 MBytes 926 Mbits/sec
[ 4] 11.0-12.0 sec 110 MBytes 927 Mbits/sec
[ 4] 12.0-13.0 sec 110 MBytes 920 Mbits/sec
[ 4] 13.0-14.0 sec 110 MBytes 925 Mbits/sec
[ 4] 14.0-15.0 sec 111 MBytes 929 Mbits/sec
[ 4] 15.0-16.0 sec 111 MBytes 932 Mbits/sec
[ 4] 16.0-17.0 sec 110 MBytes 926 Mbits/sec
[ 4] 17.0-18.0 sec 110 MBytes 923 Mbits/sec
[ 4] 18.0-19.0 sec 110 MBytes 923 Mbits/sec
[ 4] 19.0-20.0 sec 111 MBytes 928 Mbits/sec
[ 4] 0.0-20.0 sec 2.15 GBytes 925 Mbits/sec
[ 4] local 172.16.100.100 port 10000 connected with 172.16.150.1 port 60104
[ 4] 0.0- 1.0 sec 69.0 MBytes 579 Mbits/sec
[ 4] 1.0- 2.0 sec 9.88 MBytes 82.9 Mbits/sec
[ 4] 2.0- 3.0 sec 13.8 MBytes 116 Mbits/sec
[ 4] 3.0- 4.0 sec 9.79 MBytes 82.1 Mbits/sec
[ 4] 4.0- 5.0 sec 9.98 MBytes 83.7 Mbits/sec
[ 4] 5.0- 6.0 sec 10.4 MBytes 87.2 Mbits/sec
[ 4] 6.0- 7.0 sec 13.6 MBytes 114 Mbits/sec
[ 4] 7.0- 8.0 sec 12.0 MBytes 101 Mbits/sec
[ 4] 8.0- 9.0 sec 10.0 MBytes 83.9 Mbits/sec
[ 4] 9.0-10.0 sec 9.93 MBytes 83.3 Mbits/sec
[ 4] 10.0-11.0 sec 16.5 MBytes 138 Mbits/sec
[ 4] 11.0-12.0 sec 11.7 MBytes 98.4 Mbits/sec
[ 4] 12.0-13.0 sec 18.4 MBytes 155 Mbits/sec
[ 4] 13.0-14.0 sec 10.1 MBytes 84.6 Mbits/sec
[ 4] 14.0-15.0 sec 15.3 MBytes 129 Mbits/sec
[ 4] 15.0-16.0 sec 19.4 MBytes 163 Mbits/sec
[ 4] 16.0-17.0 sec 10.8 MBytes 90.7 Mbits/sec
[ 4] 17.0-18.0 sec 12.1 MBytes 102 Mbits/sec
[ 4] 18.0-19.0 sec 10.2 MBytes 85.4 Mbits/sec
[ 4] 19.0-20.0 sec 9.73 MBytes 81.6 Mbits/sec
[ 4] 0.0-20.4 sec 309 MBytes 127 Mbits/sec
[SUM] 0.0-20.4 sec 378 MBytes 155 Mbits/sec
The problem does not occurs each time. Sometime results are better but sometimes results are very poor. Sometimes we can observe sometimes 0 bytes/sec of performance.
The connection is so poor sometimes that ssh session are very difficult to establish and when connected, we’ve got 1 digit per second displayed on screen.
To illustrate we’ve reproduce a 0 Bytes/sec issue:
------------------------------------------------------------
Server listening on TCP port 10000
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.16.150.1, TCP port 10000
TCP window size: 512 KByte (default)
------------------------------------------------------------
[ 4] local 172.16.100.100 port 58168 connected with 172.16.150.1 port 10000
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 1.0 sec 512 KBytes 4.19 Mbits/sec
[ 4] 1.0- 2.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 2.0- 3.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 3.0- 4.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 4.0- 5.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 5.0- 6.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 6.0- 7.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 7.0- 8.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 8.0- 9.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 9.0-10.0 sec 51.9 MBytes 435 Mbits/sec
[ 4] 10.0-11.0 sec 71.2 MBytes 598 Mbits/sec
[ 4] 11.0-12.0 sec 111 MBytes 932 Mbits/sec
[ 4] 12.0-13.0 sec 109 MBytes 916 Mbits/sec
[ 4] 13.0-14.0 sec 109 MBytes 918 Mbits/sec
[ 4] 14.0-15.0 sec 110 MBytes 924 Mbits/sec
[ 4] 15.0-16.0 sec 80.6 MBytes 676 Mbits/sec
[ 4] 16.0-17.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 17.0-18.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 18.0-19.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 19.0-20.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 0.0-20.2 sec 644 MBytes 267 Mbits/sec
We are also able to reproduce when plugging 2 TX2 cards directly.
Searching on nvidia forum we’ve found a similar issue that hasn’t solved: