TX1 Ethernet Gigabit link topping out at ~600 Gbits/s

Are there any performance tweaks for boosting the ethernet throughput on a TX1?

When I run iperf (on a server, I use iperf -s, on a Tegra I run iperf -c IPofServer -d)
I get bandwidths of 612 and 237 Mbits/s for different transfer sizes. On a Linux desktop to the same server I get 883 and 929 Mbits/s for the same test.

There is a gigabit switch between the server and tegra.

Try increasing the memory available to tcp for buffers (net.ipv4.tcp_rmem and net.ipv4.tcp_wmem sysctl variables, make sure SACK and window scaling are turned on. Google for it, you’ll find lots of examples on how to do that.

I did the system networking performance scripts like in:
https://devtalk.nvidia.com/default/topic/799075/embedded-systems/jetson-tk1-r8169-netdev-watchdog-timeout-solved-/

They didn’t seem to do much. The only thing that did help in terms of benchmarking was using iperf with -P 5, which uses multiple ports. This increased the transmission benchmark from a linux desktop server to the Tegra from ~250->750 Mbits/s. I still get typical benchmark results ~600-700 Mbits/s.

A strange thing is when I have the system performance monitor on the Tegra I’m testing, speeds go up over 900 Mbits/s consistently for transfer from the Tegra to another host.

Simultaneous bi-directional transfer is quite slow (624 Mbit out from the Tegra, ~200 Mbit in). On the thread above they were able to achieve >900Mbits/s bi-directional on the TK1, but that has a different chip. The TX1 apparently has USB3->ethernet Realtek module.

For the network interface involved, running “ifconfig” will have a line starting with “RX errors” and another with “TX errors”. On both machines involved in the testing, do either show any errors, dropped, or overruns? I doubt on a local network they would, but this is the first thing to eliminate, especially just after doing bi-directional testing (which slowed down).

This is a very timely thread, I just discovered the same issue myself doing some network performance testing with onboard ethernet on the TX1 dev board.

Using the onboard ethernet, the best network performance I can achieve with the TX1 is ~600Mbit/sec outbound, and ~900Mbit/sec inbound, with an MTU of 7750 - a strange value. Other MTU values had wildly different results, especially the default MTU of 1500 and the Jumbo Frame MTU of 9000.

The onboard ethernet uses an RTL8153 chip, and the r8152 module/driver, but it’s then connected internally via a USB hub. I believe this USB layer is the root cause of the lack of full GIgE performance.

I confirmed the USB layer is a problem when I used an external GigE/USB3 adapter and and recorded the exact same performance as the onboard ethernet, but when I connected a PCIe GigE card I was able to achieve full GigE (~940Mbit/sec) in both directions with only ~30% CPU utilization.

Testing outbound bandwidth with the onboard ethernet consumed 99% CPU of a single core (out of 4), and inbound consumed ~40% CPU of a single core. When running the tests I noticed that sometimes the CPU was <99%, in which case the bandwidth would improve. This smells like the outgoing connection is being limited by the CPU utilization, probably due to overhead of pushing everything through USB?

I used ‘nuttcp’ to measure bandwidth, with a CAT6 cable connecting the built-in (eth0) ethernet port directly to another host (no switch between the hosts). MTU is the same on both transmitting and receiving hosts.

When tested separately, the external host was able to achieve ~950Mbit/sec in both directions with low (<10%) CPU utilization.

These tests are all executed on the TX1, thus ‘Transmit’ is TX1 → external host, and ‘Receive’ is TX1 ← external host.

MTU: 1500**
Transmit: 328.8125 MB / 10.00 sec = 275.7534 Mbps 99 %TX 12 %RX 0 retrans 2.77 msRTT
Receive: 1015.1574 MB / 10.02 sec = 849.9247 Mbps 3 %TX 39 %RX 0 retrans 1.95 msRTT

MTU: 2500
Transmit: 404.3750 MB / 10.00 sec = 339.1088 Mbps 99 %TX 16 %RX 0 retrans 2.95 msRTT
Receive: 1095.1747 MB / 10.03 sec = 915.8842 Mbps 3 %TX 41 %RX 0 retrans 2.10 msRTT

MTU: 3500
Transmit: 507.1875 MB / 10.00 sec = 425.3729 Mbps 99 %TX 18 %RX 0 retrans 2.99 msRTT
Receive: 182.0313 MB / 10.00 sec = 152.6530 Mbps 1 %TX 42 %RX 0 retrans 1.52 msRTT

MTU: 4500
Transmit: 602.5625 MB / 10.04 sec = 503.3931 Mbps 99 %TX 13 %RX 0 retrans 2.96 msRTT
Receive: 137.6468 MB / 10.01 sec = 115.3208 Mbps 1 %TX 41 %RX 0 retrans 2.01 msRTT

MTU: 5500
Transmit: 680.6875 MB / 10.00 sec = 570.8439 Mbps 99 %TX 18 %RX 0 retrans 3.13 msRTT
Receive: 115.8415 MB / 10.22 sec = 95.1277 Mbps 0 %TX 35 %RX 0 retrans 2.06 msRTT

MTU: 6500
Transmit: 738.5000 MB / 10.00 sec = 619.3728 Mbps 99 %TX 19 %RX 0 retrans 2.65 msRTT
Receive: 136.9263 MB / 10.21 sec = 112.5081 Mbps 0 %TX 36 %RX 0 retrans 2.15 msRTT

*MTU: 7000
Transmit: 753.1250 MB / 10.00 sec = 631.5651 Mbps 99 %TX 13 %RX 0 retrans 2.11 msRTT
Receive: 145.1984 MB / 10.00 sec = 121.7505 Mbps 0 %TX 37 %RX 0 retrans 2.05 msRTT

MTU: 7500
Transmit: 731.5625 MB / 10.00 sec = 613.5733 Mbps 99 %TX 32 %RX 0 retrans 1.98 msRTT
Receive: 1085.1625 MB / 10.01 sec = 909.6697 Mbps 4 %TX 96 %RX 0 retrans 2.02 msRTT

*MTU: 7750
Transmit: 776.0020 MB / 10.01 sec = 650.5608 Mbps 57 %TX 33 %RX 0 retrans 1.66 msRTT
Receive: 1125.6105 MB / 10.02 sec = 942.5458 Mbps 3 %TX 97 %RX 0 retrans 2.04 msRTT

*MTU: 8000
Transmit: 725.0625 MB / 10.00 sec = 608.0882 Mbps 99 %TX 32 %RX 0 retrans 0.82 msRTT
Receive: 1169.3590 MB / 10.01 sec = 980.0133 Mbps 3 %TX 36 %RX 0 retrans 1.72 msRTT

MTU: 8500
Transmit: 612.0625 MB / 10.00 sec = 513.3099 Mbps 99 %TX 11 %RX 0 retrans 1.93 msRTT
Receive: 133.5549 MB / 10.23 sec = 109.5679 Mbps 0 %TX 31 %RX 0 retrans 2.22 msRTT

MTU: 9000**
Transmit: 644.5625 MB / 10.00 sec = 540.5811 Mbps 99 %TX 10 %RX 0 retrans 0.83 msRTT
Receive: 141.8691 MB / 10.02 sec = 118.7969 Mbps 0 %TX 33 %RX 0 retrans 1.66 msRTT

  • extra testing to dial-in MTU value for best Transmit/Receive bandwidth
    ** Same results as when using a USB3 Gigabit adapter

Host details:

ubuntu@tx1-jpeacock:~$ uname -a
Linux tx1-jpeacock 3.10.67-docker-23.1.0 #1 SMP PREEMPT Wed Apr 27 20:53:48 UTC 2016 aarch64 aarch64 aarch64 GNU/Linux

ubuntu@tx1-jpeacock:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION=“Ubuntu 14.04.4 LTS”

ubuntu@tx1-jpeacock:~$ nuttcp -V
nuttcp-6.1.2

I re-built the kernel to support Docker (the tests were run outside of Docker), and upgraded Ubuntu from 12.04 to 14.04.

I did try some of the TK1 network tuning from the previous link:

sysctl -w net.core.rmem_max=33554432
sysctl -w net.core.wmem_max=33554432
sysctl -w net.core.rmem_default=33554432
sysctl -w net.core.wmem_default=33554432

They made no difference in my testing, but I have not tried the other changes yet, I’ll test them and update this thread with any results.

check out this thread:

https://devtalk.nvidia.com/default/topic/979635/jetson-tx1/ethernet-speed-increases-when-micro-usb-2-0-connector-is-connected/