Package loss on bandwidth reduced network - AGX Orin

I’ve observed package loss on the AGX Orin development board on the ethernet interface, where a similar setup on a TX2 development board works fine. There is no package loss when bitrate is below 1 mbps and above ~500, but does otherwise drop quite a few packages. I’m measuring package drop using iperf3 by inspecting the “Retr” column (retransmissions on TCP).

My setup is quite simple. I have a windows computer running Iperf3 server connected to an ethernet switch. The Orin dev-board is connected to the same switch. The orin runs an iperf3 client.

I’m reducing bandwith using two methods. The first is to limit bandwith in iperf3 using the -b argument, and the other method is to swap the ethernet cable between the windows computer and switch with a similar cable where I’ve cut two of the wire-pairs to cap it at 100 mbps. Both methods yields similar results. And for verification, the TX2 dev-board never gives any package loss using both methods (indicating that the cables, windows computer and switch are working correctly). I also tried running iperf3 over wifi, which works perfectly without retransmissions.

A sample output from Iperf3:

$ ./iperf3 -c <server-ip> -b 100mbps
Connecting to host <server-ip>, port 5201
[  5] local <orin-ip> port 51596 connected to <server-ip> port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  12.0 MBytes   100 Mbits/sec   40    113 KBytes
[  5]   1.00-2.00   sec  11.9 MBytes  99.6 Mbits/sec   10   84.1 KBytes
[  5]   2.00-3.00   sec  12.0 MBytes   101 Mbits/sec    3    101 KBytes
[  5]   3.00-4.00   sec  11.9 MBytes  99.6 Mbits/sec   11    107 KBytes
[  5]   4.00-5.00   sec  11.9 MBytes  99.6 Mbits/sec    5   84.1 KBytes
[  5]   5.00-6.00   sec  12.0 MBytes   101 Mbits/sec    4   99.8 KBytes
[  5]   6.00-7.00   sec  11.9 MBytes  99.6 Mbits/sec   13   97.0 KBytes
[  5]   7.00-8.00   sec  12.0 MBytes   101 Mbits/sec    6    108 KBytes
[  5]   8.00-9.00   sec  11.9 MBytes  99.6 Mbits/sec    5   87.0 KBytes
[  5]   9.00-10.00  sec  11.9 MBytes  99.6 Mbits/sec    4   94.1 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec  101             sender
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec                  receiver

And at max speed:

$ ./iperf3 -c <server-ip>
Connecting to host <server-ip>, port 5201
[  5] local <orin-ip> port 51600 connected to <server-ip> port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   957 Mbits/sec    0    255 KBytes
[  5]   1.00-2.00   sec   113 MBytes   947 Mbits/sec    0    255 KBytes
[  5]   2.00-3.00   sec   113 MBytes   951 Mbits/sec    0    255 KBytes
[  5]   3.00-4.00   sec   113 MBytes   947 Mbits/sec    0    255 KBytes
[  5]   4.00-5.00   sec   114 MBytes   952 Mbits/sec    0    255 KBytes
[  5]   5.00-6.00   sec   113 MBytes   948 Mbits/sec    0    255 KBytes
[  5]   6.00-7.00   sec   113 MBytes   950 Mbits/sec    0    255 KBytes
[  5]   7.00-8.00   sec   113 MBytes   947 Mbits/sec    0    255 KBytes
[  5]   8.00-9.00   sec   113 MBytes   952 Mbits/sec    0    255 KBytes
[  5]   9.00-10.00  sec   113 MBytes   947 Mbits/sec    0    255 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.11 GBytes   950 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   949 Mbits/sec                  receiver

Our system is designed with100 mbps cables (which will give ~1 package lost per MB transferred). We’ve also previously relied on relatively reliable UDP-messages (since we’ve had a stable local network in our system). I guess we’re mainly concerned that our systems performance will be reduced swapping TX2 for Orin.

Have you (or anyone else) already seen this issue? Maybe someone could test with a similar board to evaluate whether or not my ethernet device on the Orin dev-board is broken? I’m also interested to find if this is a HW issue or if it’s simply the device driver of the ethernet device that isn’t as mature as that of the TX2 such that I can update something in the future to improve network stability.

I tried the same test on my Orin AGX connected with a cat 5e 1000Mbps ethernet cable to a desktop running the command iperf3 -s. Here is the output from Orin.

iperf3 -c 10.3.0.1 -b 100mbps
Connecting to host 10.3.0.1, port 5201
[  5] local 10.3.0.228 port 38448 connected to 10.3.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  12.0 MBytes   101 Mbits/sec    0    178 KBytes       
[  5]   1.00-2.00   sec  11.9 MBytes  99.6 Mbits/sec    0    178 KBytes       
[  5]   2.00-3.00   sec  12.0 MBytes   101 Mbits/sec    0    178 KBytes       
[  5]   3.00-4.00   sec  11.9 MBytes  99.6 Mbits/sec    0    178 KBytes       
[  5]   4.00-5.00   sec  11.9 MBytes  99.6 Mbits/sec    0    178 KBytes       
[  5]   5.00-6.00   sec  12.0 MBytes   101 Mbits/sec    0    178 KBytes       
[  5]   6.00-7.00   sec  11.9 MBytes  99.6 Mbits/sec    0    178 KBytes       
[  5]   7.00-8.00   sec  11.9 MBytes  99.6 Mbits/sec    0    178 KBytes       
[  5]   8.00-9.00   sec  12.0 MBytes   101 Mbits/sec    0    178 KBytes       
[  5]   9.00-10.00  sec  11.9 MBytes  99.6 Mbits/sec    0    178 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec                  receiver

iperf Done.

iperf3 -c 10.3.0.1
Connecting to host 10.3.0.1, port 5201
[  5] local 10.3.0.228 port 38442 connected to 10.3.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   953 Mbits/sec    0    483 KBytes       
[  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0    483 KBytes       
[  5]   2.00-3.00   sec   111 MBytes   934 Mbits/sec    0    483 KBytes       
[  5]   3.00-4.00   sec   112 MBytes   943 Mbits/sec    0    483 KBytes       
[  5]   4.00-5.00   sec   112 MBytes   937 Mbits/sec    0    483 KBytes       
[  5]   5.00-6.00   sec   113 MBytes   944 Mbits/sec    0    483 KBytes       
[  5]   6.00-7.00   sec   111 MBytes   935 Mbits/sec    0    483 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec    0    483 KBytes       
[  5]   8.00-9.00   sec   112 MBytes   942 Mbits/sec    0    483 KBytes       
[  5]   9.00-10.00  sec   112 MBytes   936 Mbits/sec    0    483 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec                  receiver

iperf Done.

please try to disable the ubuntu GUI on Orin and tested again.

Also, is this rel-35.1 or rel35.2.1?

It’s the same without GUI.

It’s rel-35.1

I wasn’t entirely sure what precisely “disable the ubuntu GUI” means. It wasn’t connected to display in the first place, but I ended up flashing a minimal flavor rootfs, so there should be absolutely no GUI running. (Root File System — Jetson Linux Developer Guide documentation)

I have added network manager and configured it with a few standard network connections. I also installed a few dev-tools to build iperf3. I then also have run nvpmodel -m 0.

Please upgrade to rel-35.2.1. There are several patches added to enhance the 10GbE perf of Orin devkit.

And desktop is still needed to be disabled.

Thanks for the reply Wayne,

I’ve flashed a minimal image (without GUI), now from rel-35.2.1, with the same basic setup as I described previously.

The situation has greatly improved when the bandwidth is limited from SW. It’s now at ~1-3 resends per GB.

However, using the 100 mbps cable, the situation looks identical. I’ve run the TX2 and Orin on the same network, taking turns to run 10 seconds of iperf at maximum bandwith (which is limited at 100 mbps over the cable), and the TX2 have not produced a single “Retr”, while the Orin is ~ 1 “Retr” per MB.

Any time you have such an error you should post the output of this if run right after the error (e.g., after the iperf3 test):
ifconfig
(or ip -s addr)

Here’s the output of ifconfig right after the iperf3 run:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1466
        inet 10.52.9.1  netmask 255.255.255.0  broadcast 10.52.9.255
        inet6 fe80::xxxx:xxxx:xxxx:xxxx  prefixlen 64  scopeid 0x20<link>
        ether xx:xx:xx:xx:xx:xx txqueuelen 1000  (Ethernet)
        RX packets 130392  bytes 6014928 (6.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 50775  bytes 597112350 (597.1 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

(I’ve hidden the values of some addresses, but they are valid). I ran a diff of the outputs before and after the run, and it’s only TX/RX packets related.

Note that the NIC itself did not report any errors, no drops, and no overruns. If this was taken after a test with drops, then it implies the software consuming or sending was itself responsible for the issues. An example might be if data arrived too fast, which in turn might imply the driver serviced the PHY, but the application transferring from the interface to itself might not have been fast enough. Hard to say, but we can conclude that the network itself is not causing this (there were no interactions with misconfigured network devices).

Thanks for the input, and it’s good to sort out that the underlying network should be fine.

I tried running our UDP-based application simultaneously with iperf, and it still drops packages (though the drop rate has gone down a bit after the update and disabling the GUI)

I’m not sure how iperf is implemented, but I would assume those re-transmits it reports comes from an underlying TCP driver (as TCP is really just a stream as seen from the application).

I am still a bit curious of why this doesn’t happen on the TX2. There, I’m able to run iperf, still having enough headroom for an additional UDP-based application without any package loss. Could it be that the TX2 TCP implementation somehow reads the bandwidth limitation from the underlying interface to limit the data-rate detected at the link layer? Or maybe the Orin driver(s) assume a faster network and is more “aggressive” in trying to increase bandwidth, thus squeezing out other processes?

UDP is designed to drop packets under various conditions, e.g., network too congested. However, this would show up as a “dropped” in the “ifconfig” command for that interface, and I did not see that. Whenever you see this, be sure to examine the ifconfig output for that particular interface; if it says nothing was dropped, then the driver and the NIC itself are fully functional and the drop is from the consumer of the data rather than the network itself. All I can suggest there is that perhaps CPU load is getting in the way, or perhaps there is some place in the network stack that more buffer could be used (that’s just conjecture, I don’t know for sure).

If TCP reports a retransmit, then we know data arrived to the driver, and that the driver sent the data, but there was no handshake of arrival (perhaps it is the other side of the connection missing the handshake, but perhaps it is also from the Jetson side).

It is a bit of a mystery. However, I am wondering about something else which might (or might not) shed a clue. What happens if the test is over the loopback address 127.0.0.1? Be sure to examine ifconfig on that interface. Then, let’s pretend that the 10GbE has a local address of “192.168.1.2”: What happens when you run that same test to the local end of that PHY at 192.168.1.2 (versus running the test to the other end of the network where some other device is responding)?

Again, thanks for your input on this matter,

As you say, you should expect some package loss when running UDP. I think most of the package loss comes from package loss on the link layer, and router buffer overflow from network layer. However, this is maybe not the case when ifconfig doesn’t report any errors?

I did a few local tests, but I wasn’t really able to run iperf3 over the eth0 device since it routed it to localhost (This was found by observing the counters/statistics from ifconfig, All local interfaces affect statistics of lo device). Our UDP test I’ve currently setup doesn’t run on the same device unfortunately, as it depends on some windows-only SW. Maybe I could setup a simple UDP-echo to mock it? I will just need a bit more time before I can conclude on your suggested tests. And, running over lo(localhost) produce no retransmits btw.

However, during my experiments, I was able to get the error rate to zero (again, for reasons I don’t understand). In iperf, I can set the TCP window size using the -w argument. iperf doesn’t necessarily set the window size to the precise number I set, but the main observation is that it doesn’t change constantly the same way as when I don’t use -w. Thus, It looks like the re-transmissions are simply coming from the TCP window being resized. The TX2 device I have report the window-sizes from iperf to be stable for the entire run (without setting the argument), and that could be the reason why it has always worked there. I also observe that a fixed window size in iperf does reduce the bandwidth slightly (it’s 94-95 mbps with a fixed window size, and 92-98 mbps when it’s dynamic). Then, maybe the UDP-packages is able to sqeeze through in the fixed window size case where there might be tiny headroom in bandwidth (and it only needs ~1kB every other second). When the window size is dynamic, the variability might push out the UDP-packages when iperf use a large window and this tiny headroom is completely squashed. I’m only speculating here, but based on the findings from ifconfig, I think the problem can be narrowed down to come from the TCP/IP layer, where a dynamic window size could be the cause of the reported re-transmissions.

Going a bit further, I don’t understand why the Orin device shows this behavior compared to the TX2, but maybe it’s just that the drivers are newer and “assume” it’s on a higher bandwidth network that it tries to utilize the bandwidth to a greater extend by varying the window size constantly… Not sure it’s a bug or a feature here…

I can’t say for certain, but I think ifconfig would report any overruns/overflows/collisions/errors for the entire network stack. However, the network stack eventually copies data somewhere else, and it looks like the network stack is itself without error/collision/drops/overruns/underruns (at least so far as ifconfig is concerned).

TCP retransmit has options to it, and so it might be more useful in tracing what the point of retransmit occurs at. Here’s a general search on the topic:
linux check tcp retransmission timeout

One specific detail I found interesting was from this URL:
https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html

Quite often drivers (including network) will provide some sort of parameter control or feedback on operation via a “/proc/sys” file. First, can you verify your iperf3 test is using IPv4 and not IPv6? If not, then perhaps first compare results under purely IPv4 and then under IPv6 (in general though, if you don’t need IPv6, I recommend disabling it). See if the protocol is related. Then if you examine this directory:
/proc/sys/net/ipv4/

You can look at any file with this in the name from that location (assuming it is IPv4):
ls *retries*

I can’t give all of the details you will need, but should you find a file, such as “tcp_syn_retries”, then you have a starting point. Incidentally, SYN packets are special with regard to what you are working with. If SYN packets are where the retries are from, then it is possible the cause is software designed to prevent “SYN floods” (an attack). Which reminds me…

If you look at file “/etc/sysctl.conf”, then you’ll notice the entries in it correspond to “/proc/sys” files. This is one method of changing a configuration in those files at startup. In particular, although it is often turned off, examine this entry:
net.ipv4.tcp_syncookies=1

If that is commented out, then it should not be running. You can verify the actual runtime value:
cat /proc/sys/net/ipv4/tcp_syncookies
(there is more than one place this might be enabled, but this is where the final state is visible)

If SYN cookies are enabled, then it is likely the network is 100% doing as it should since a peformance test is flooding the network as fast as it can. SYN packets are part of the handshake at the start of a TCP connection. Once upon a time some brilliant idiot figured out how to “half open” a TCP connection and flood them, such that the server could not accept valid incoming TCP connections. Thus SYN cookies were born.

UDP might have something similar, but it isn’t a SYN cookie since there are no SYN packets in UDP. Either way, if it is an intentional mechanism to protect a server from some form of flood, then the ifconfig and other network layers would show no errors. Even so, you’d see dropped packets from the software using that interface.

Just explore that URL and perhaps “cat” the files you find of interest in “/proc/sys/net”, and then research what that file might be (which could include posting here on the forum for those who might know).

Thanks for the additional details on this matter. I feel we are starting to go somewhat into details I’m not already familiar with, but I’ll try my best to investigate.

First of all, I wasn’t able to enable ipv6 on the windows computer running iperf server. The devices are managed by IT department, and changing settings is often difficult, if not impossible. I’ll see if there is anything I can do to run an ipv6 test anytime soon. Thus, all tests have been run on ipv4, which will be used in our production system.

Investigating proc/sys/net/ipv4/*retries* doesn’t change at all while running an iperf test that reports retransmissions. That is all files matching *retries*, including tcp_syn_retries.

/etc/sysctl.conf was in it’s entirety commented out, but /proc/sys/net/ipv4/tcp_syncookies contained 1. I tried uncommenting the line, and setting it to 0, which after a reboot showed 0 in /proc/sys/net/ipv4/tcp_syncookies. Re-running iperf (with reported retransmits) didn’t affect ifconfig or proc/sys/net/ipv4/*retries* statistics.

I think I start to agree with you on that the network is doing what it should. Again, if the network is pushed to it’s limits, packages are expected to be dropped at some point. However, it should maybe be reported in the statistics, but maybe as you say, the syncookies prevents this kind of package drop from being logged.

I also went a bit further into iperf, and increased the interval of the printout, and I clearly see a pattern in retransmits and window size. The retransmits seems to only happens when the window size is reduced. Here’s one example (but it’s always the case):

[  5]   1.30-1.40   sec  1.22 MBytes   103 Mbits/sec    0    111 KBytes
[  5]   1.40-1.50   sec   940 KBytes  77.0 Mbits/sec    2   82.2 KBytes
[  5]   1.50-1.60   sec  1.22 MBytes   103 Mbits/sec    0   90.5 KBytes
[  5]   1.60-1.70   sec  1.22 MBytes   103 Mbits/sec    0   98.9 KBytes
[  5]   1.70-1.80   sec  1.22 MBytes   103 Mbits/sec    0    109 KBytes
[  5]   1.80-1.90   sec   940 KBytes  77.0 Mbits/sec    0    116 KBytes
[  5]   1.90-2.00   sec  1.22 MBytes   103 Mbits/sec   15   84.9 KBytes

In a new setup, I use two instances of iperf, one running UDP and another running TCP. There’s a 0.29 % package loss on the UDP side when TCP runs with a “dynamic” window size, and 0 % with a “fixed” window size. Regardning the retransmits reported by iperf on TCP, this seems to be read using getsockopt(..., TCP_INFO, ...). Could it just be that when the window size is reduced, it discards packages in the window that has already been transmitted, but still is somehow able to detect it as a “retransmit”, while the system statistics isn’t strictly treating it as a resend?

I also continued a bit with some debugging, and I found the tool nstat to be very usefull. I run it as nstat -a, and I can monitor certain parts of it continuously. Especially, it does seem to detect retransmits on TCP on the TcpRetransSegs field. So then, maybe it’s considered valid TCP behaviour to perform retransmits, even though it’s not caused by an error (and only errors are reported by ifconfig). Other fields outputted by nstat -a can be seen here below:

TcpExtTCPLostRetransmit         181
TcpExtTCPLossFailures           4
TcpExtTCPFastRetrans            14322
TcpExtTCPSlowStartRetrans       132

So, to me it seems like most of the failures come under the cathegory FastRetrans, which maybe happens when the cwnd is reduced in size. I also tried looking at the UdpIgnoredMulti field, but it seems to only increase steadily no matter the UDP package loss. However, it could simply be that the UDP-packets are dropped at the switch, since the Jetson can send data with 1gbps, while the cable at the switch limits it to 100mbps. It might make sense that the varying buffer size at the jetson simply fills some buffer at the switch, and then UDP packages are simply lost from time to time. Swapping the cable to allow for 1gbps, and limiting bandwith in SW, there is no package loss.

I’m tempted to say that these findings can close the topic, but maybe you have additional input on this.

FYI, IPv6 is often problematic. That isn’t just a “Jetson thing”, it seems to also be more than a “Linux thing”. If your IT department is making you use IPv6, then this is probably going to make life difficult. However, I wasn’t certain from the above description if you were saying part of it must be IPv6, or that no IPv6 was involved. If there is no IPv6, then I think you are in luck (I think that is what you are saying, that there is currently no IPv6). I would stay away from IPv6 until you have everything else working.

With tcp_syncookies set to 0, you mentioned no note of syn retries in /sys. What about the actual iperf3 test when guaranteed tcp_syncookies is off (“0”)?

The statistics will only show if drops are in the IP stack. This will seem off-topic, but it is related to dropped data, so you might find this of use…

Whenever hardware needs a driver to service it, e.g., after a network has received enough data, or when data is ready to be sent to the network, it will trigger a hardware interrupt (hardware IRQ). This is done with a physical wire. There is more than one request since there is a lot of hardware, and the scheduler is responsible for picking which IRQ to service next based on priorities. I believe that this is working ok, and this is what results in no drops shown in ifconfig.

There is also software running which is unrelated to hardware. For example, one might want to perform a checksum on what had arrived from the ethernet. Or one might want to take raw data from a disk driver, and interpret that as the ext4 filesystem. Both of those would use data provided initially by a hardware IRQ, but what they do with the data is purely software (whatever provided that data is irrelevant once the data is transferred). Software has interrupts as well, aptly named a “software interrupt” (or software IRQ). No physical wire is involved. Most of the time a software IRQ is simply run via a timer, e.g., a 1000 Hz timer is common. Once again, the scheduler decides which interrupt to service in which order based on priorities.

A hardware IRQ might be limited on which CPU(s) it can run on since a direct wire might not be available to all cores. Core 0 is always available. A software IRQ can run on any core since it is basically an abstract concept (obviously the memory controller is always accessible on all cores, as well as timers).

A good driver design does the minimal work on the hardware IRQ. It then either passes the more abstract work on to either the end program (user space software), or else to a software driver (still in kernel space, but able to migrate cores). If your use-case requires a hardware IRQ to software IRQ, then the software driver could still be dropping data; since that drop is not in the ethernet stack, the ethernet stack would not know this and would not note a drop. The end program might not know either. A software IRQ could be dropping data. Or it could be the end user software, either iperf3 or one of the libraries it uses.

I couldn’t tell you where the drops are occurring other than it does not seem to be the network layer. It is quite possibly a problem in one of the other kernel drivers. It could be iperf3 itself. It could be some layer in the kernel which copies data to/from user space (user space being anything outside of the kernel, e.g., iperf3 itself, but the copy of data starts in kernel space).

One thing you might do is to experiment with the “netcat” program (often this has the alias “nc”). What this program does is to simply copy data to/from an address/port/protocol. nc has a lot of uses, but all you will be concerned with is copying a file from or to the Jetson, perhaps making sure it is IPv4 (or if you are testing IPv6, then that). There are lots of examples on the internet, but here is one
https://linuxize.com/post/netcat-nc-command-with-examples/
(search for “Sending Files through Netcat”)

Set up the receive end ahead of time to some arbitrary unused port. Then use netcat to send a very large file, and see if timing is consistent with that you see from iperf3. Check ifconfig each time and see if it shows missing/dropped data. Note that you can use the “pv” command to monitor any “cat” type command through a pipe (“cat” just echoes the bytes from a file; pipe just redirects that echo to some other process’s input). On the Jetson, try this, which discards the echo:

ifconfig
cat /boot/Image | pv > /dev/null
ifconfig

(answer looks like "32.5MiB 0:00:00 [ 248MiB/s] [ <=> ")

More detailed:
# cat /boot/Image | pv -t -r -b > /dev/null

The “> /dev/null” could be replaced with a netcat command, e.g., just to be arbitrary, and to send to the localhost (netcat can use the same address as the host it sends from to also receive):

# Just intuitive:
# Receive side:
nc -l 5555 | pv -t -r -b
# Send side:
echo "hello world" | pv -t -r -b | nc 127.0.0.1 5555

# Now a large file and throw away content as it arrives:
# Receive side:
nc -l 5555 | pv -t -r -b > /dev/null
# Send side:
cat /boot/Image | pv -t -r -b | nc 127.0.0.1 5555

The largest file is probably the rootfs partition, but I won’t suggest that because it is actively being accessed and would throw off timings. Really though I’m suggesting just to send the largest thing you can think off and see if ifconfig shows losses then. Somewhere in the process the only solution will be after finding out where the resends are originating. Feel free to ask questions though on what to test.

Thanks for you reply.

Clarifications

I’m forced to use IPv4. No IPv6 is used.

I said that the results were unchanged, but I was specifically talking about /proc/sys/net/ipv4/tcp_syn_retries and the iperf3 test. I’m not sure we’ve ever talked about syn_retries in /sys, but you’re probably talking about /proc/sys right? So to summarize: /proc/sys/net/ipv4/tcp_syn_retries is unchanged after a run of iperf3 and the iperf3 results themselves are similar with tcp_syncookies enabled and disabled (as reported by cat /proc/sys/net/ipv4/tcp_syncookies). So I’ve not been able to find any difference.

Netcat test

Now, to the results from netcat! I used a 1.6 GB file. The data is transferred from the Jetson to a new server, using the same cables (the previous computer was windows-only, so I used an Ubuntu computer instead as server). So, the Jetson is still connected with a 1GbE cable to the switch and the server/receiver is connected with a 100 mbps cable. After running netcat the way you specified:

  • ifconfig: 0 errors
  • /proc/sys/net/ipv4/tcp_syn_retries: Same number as before run
  • nstat | grep -i "tcp":
    • TcpInSegs: 973926
    • TcpOutSegs: 1190859
    • TcpRetransSegs: 752
    • TcpExtTCPFastRetrans: 751

I would say the number of retrans is about the same as I see using Iperf.

nstat only reports statistics since last run of the program, and this note is just a clarification that I remembered to run it right before and right after the test.

My hypothesis:

I did investigate a bit further, and right now, I think there might be an issue related to the ethernet flow control (Ethernet flow control - Wikipedia). Mainly, a switch has the ability to send a pause frame when its buffers starts to fill up. Somehow, I think this isn’t captured by the driver for the 10GbE on the Orin, but is captured by the 1GbE on the TX2 (This could be related to the things you explained regarding IRQ). So in other words, I think it’s able to send data without any problems, but the pause frames might not be reported as errors in the statistics we’ve looked since they’re at the Link-layer (or maybe because ignored messages are not reported at all). I think the switch is expected to send pause frames since the Jetson is able to send at 1 Gbps to the switch, but the switch is not capable of forwarding them faster than 100 Mbps because of the cable going from the switch to the iperf server. Then, the network statistics are not reporting the errors, since the packages are dropped in the switch, which is after IPs “send and forget”. Since TCP is concerned with sending from port-to-port anywhere in the world, I think it’s intentional that these re-transmits are not treated as errors, given that TCP cannot rely on feedback mechanisms from the underlying interface.

Regarding this, I’ve also looked at the settings that seems to be related to the pause frames, in which all are enabled on both TX2 and Orin. Most forum-posts I’ve seen on the web where people don’t seem to have pause frames working turns out to be some settings that disables them. I haven’t yet found any of them to be disabled…

What do you think about this? I’m not sure how I can verify that the switch is sending pause frames. I guess I could use Wireshark by inserting a hub between the Jetson and the switch, and use the hub to sniff the packets. However, I don’t have a hub… Going even further, I’m not sure how I can easily check if the pause frames are handled by the drivers.

I don’t know if all of the interfaces are managed. Plus I also don’t know which interface the nstat is applied to. I am uncertain if the TcpRetransSegs applies to the interface in question. Probably? But if we go back to the earlier ifconfig of eth0 we still see no drops, overruns, collisions, carrier, so on. Actual packets at the PHY being in error won’t be the cause.

You mentioned that the other Ubuntu computer in the latest test uses only a 100 Mbps connection, which might not be the best when testing things related to congestion on a gigabit system, but I do trust a Linux PC for such tests far more than a Windows PC. In this case what shows up for ifconfig of that ethernet interface on the Jetson and the PC? I’m not sure that the nstat would show what is needed at the PHY. That TcpRetransSegs might not mean what it looks like since ifconfig does not show errors of any kind.

I don’t know enough about flow control on ethernet, but from experience on other serial devices, there is a strong chance you are correct (this could be an issue which does not show up as errors on ifconfig). However, I don’t know how to test if flow control is working for ethernet (and I don’t know if TcpRetransSegs would go up or down with flow control issues, but it seems plausible).

Have you considered trying a different switch (different model, not just different hardware)? I am curious, but I doubt it would change anything since ifconfig is not noticing this. On the other hand, if it is related to the switch, then it’s going to be a long road debugging the Jetson, and a quick swap to test would say a lot. I do think that some switch drops would still show up in ifconfig, but it might alter how the error is shown.

Yes, TCP is separate from the IP layer. There can be some interesting interactions though if there is fragmentation. Thinking about it though, I see your Jetson’s MTU is 1466. Most of the time the default is 1500. There have been some issues on the forums regarding MTU (which half of determining fragmentation and resend delays). What is the MTU on the other end’s interface? Is it 1500? This might be useful:
https://linuxhint.com/how-to-change-mtu-size-in-linux/

If matching MTUs to 1500 does the job, then that might be it. I don’t know the nature of the other threads, but there has been significant activity concerning default MTU on the forum.

Someone else would have to comment on pause frames and flow control.

Again, Thank you for the reply.

Even though the topic is considered “solved” at this point, I would like to follow up the current open questions from your last reply.

To give some additional information on the TcpRetransSegs: after a run with iperf3 the “difference” of TcpRetransSegs is identical to what iperf3 reports. Thus, I think it reports from the precise same interface, though I’m not sure “where” in the stack it’s coming from. And you are correct, that we see no drops, overruns collisions, carrier etc. So because of this, I think it’s perfectly able to send messages, and that the interface itself is working perfectly. Then, I think the buffer on the switch builds up, and it tries to send pause frames back to the device. For some reason, it’s not picking it up, but it could of course be anything in the network stack, being either Transport, Network, or Link. But since we don’t see any error reports, it’s probably not in the Physical layer, and since this happens on both netcat and iperf3, we can also assume that it’s not the application layer.

I tried now. This switch seems to be limited to 100mbps (it’s quite old). Now it seems to work better. It still tries to vary the cwnd, but it happens at a much slower rate, and I still get re-transmissions when the cwnd is reduced. However, it can still operate with a relatively large window size (over 600 KBytes), so maybe this suggests that the flow-control on the Link layer works, and that the TCP driver simply tries to increase the buffer from time to time just to check if there is more throughput on the line? Alternatively, if the bandwidth is limited at transmission rather than switching, it could simply indicate the the ethernet driver isn’t able to get the data out at the maximum rate, and this is handled properly in the driver.

Yes, it’s 1500, and changing the MTU to 1500, (by setting it to 1534 in ifconfig (for some reason)) didn’t seem to affect the result.

I’ve never tried to change an ethernet buffer size. I imagine that in many cases there are “standard” Linux kernel methods to parts of this, but likely also some buffer sizes are tied to the specific driver. I have no idea which of the buffer size change articles on the internet work on a Jetson, but if you can, I’ll suggest trying to double any buffer related to this, and testing again. Examples from:
Google Search: linux increase ethernet buffer size

You may try AGX Orin - mtu 1466 - #9 by Honey_Patouceul for setting MTU, but it might not change the RX dropped packets issue.
I’m also seeing some RX dropped packets when using L4T versions from R32.7, even with quiet networks.