Jetson TX1 strange network performance behaviour (still)

Hi linuxdev,

My test was based on the “-R” with two directions.
I don’t understand why we need to use netcat here. Is this another option to measure the data rate?

I just wanted to see if actual transfer had the same issue depending on direction…sending versus receiving…and without centering wholly on iperf3’s methods. My conclusion is that send versus receive differed significantly in performance regardless of whether iperf3 was used or whether other tools were used. Send performance and receive performance should be approximately the same, but are not…iperf3 is also not the cause of the asymmetry.

Hoi WayneWWW,

just as linuxdev said, netcat is just a second way to test the bandwidth of the connection in a less clinical environment as the iperf3. And to be sure, that we don’t ran into a problem specific to iperf3 and not a general one.

The big question for us at my company is, whether we can rely on the Jetson network stack. While the 350Mb/s is really low and I only had this with one computer but 700MB/s is not really that good for a GBit Ethernet link. Knowing why the connection is significantly below the expected, especially considering the amount of retries iperf3 reports.

I just tried netcat to send an 2.5GB file(not sure if it is large enough) and the measured performance is around 700~850 Mbps no matter on sender or receiver.

Is your test environment has any usb activity? Since TX1 is using realtek chip based on usb, it may have influence to ethernet performance.

There is no other activity at all. The only thing connected to the dev board is power and network and nobody else is using it.

My system also had no other activity…changing to max from jetson_clocks.sh had very little effect.

It really seems like there is an interaction going on between host and Jetson. My host is Fedora and I’m using a private wired gigabit LAN. The reason for all of those complicated tests I had in an earlier post is because I couldn’t narrow down any particular part of the network causing this. I have no way to be absolutely certain my host isn’t causing this, but I have a very high confidence that it isn’t…something about the network traffic itself seems to be the cause…I doubt my tests would match the tests of @QUams if it were just some fluke.

I wish I had a better way to tell you how to reproduce this…I can reproduce this 100%, apparently so can @QUams…but it seems that it isn’t only the Jetson involved in reproducing this. I would need a network analyzer to know, and I do not have this. It is possible something like wireshark could do the job, but I had hoped this could be reproduced elsewhere where someone has an actual analyzer. I may look at it closer with wireshark if nobody can find the issue. Perhaps trying a different host is the best way to test…make sure the host is acting as gateway, this is how mine is set up (host has a separate wired gigabit to the internet, Jetson is to its own wired second gigabit).

I did it once between two Jetsons connected via a crossover cable and had the same behaviour. Sadly I didn’t record that one since it was at the beginning of my tests. I currently have only one development board available, so I could only re-created that setup with an auvidea board and a development board.

I haven’t used wireshark for a while, but I will dig into it and try to find out if there is anything special happening between two Jetsons.

I wonder if there was similar issues on forum as well. There was one on rel-24.2.1 in which usb activity and clock dropping causes performance dropping, but we had patches that fixed it.

Not sure if anyone reported this after rel-28. Need time to gather them.

I may need to change my test environment as well since it was always the same.

I considered as well that this is possible. I have no way to profile the run times or scheduling of the USB and ethernet drivers…which seems like the most straight forward way to see what is going on if you have a hardware-based profile ability (I assume there is some sort of JTAG profiling possible for NVIDIA which the general public can’t do). For most any other method it is trial and error or guessing. Even if the trigger of this performance issue differs between releases it may be there is some fundamental quality in common with both.

One more question… Is it necessary to reproduce this issue by using -R parameter in iperf3?

Can it be reproduced by just swapping the server/client relationship between tegra and host?

I never tested with any other arguments since this is what @QUams used.

Moin @WayneWWW!

I only used the -R option because I could use the same shell to test both directions consecutively. I never encountered any difference between -R and just swapping server/client.

We are still investigating this issue and have some tests here.

  1. We updated the Realtek ethernet driver after rel-28.2, so please try this issue on rel-28.2. This should prevent low throughput like 350Mbps.

  2. Please use below param.

iperf3 -c <server> -u -b 0 -l 16K (default length is 8K) -t 120 -i 1

My original testing was on R28.1, some after that on R28.2.

This particular test on R28.2 (jetson_clocks.sh maxed)…

Running this on TX1:

# TX1:
iperf3 -c <server> -u -b 0 -l 16K -t 120 -i 1
# ...and this on PC:
iperf3 -s

…throughput was quite good and close to gigabit.

Reversing this:

# PC:
iperf3 -c <server> -u -b 0 -l 16K -t 120 -i 1
# ...and this on TX1:
iperf3 -s

…throughput dropped dramatically. Instead of 955Mbit/s average it is around 554Mbit/s instead. Under R28.1 it never went this fast, so there is a very definite boost in throughput by going to R28.2, but direction still matters.

Unless all individual components were capable of running efficiently there would be no way any combination would achieve the near theoretical gigabit throughput. Having the order in which things occur (by being on Jetson or alternatively on PC) it seems that the order of execution has a dramatic effect (in this case reversal loses 401Mbit/s throughput…that’s dramatic). In terms of hardware this mainly means the order drivers are serviced being somewhat reversed in the two tests. I would not be surprised if this reverse case loads a driver and then the driver has to wait for something which the opposite order would not have had to wait for.

Whatever the changes are in going from R28.1 to R28.2 were dramatic improvements even in the “low throughput” case (it’s a couple hundred Mbit/s faster that it was even in the slow case). I have no way to profile what is going on in terms of driver run order and time slices used, but I am guessing it is dramatically different between running forward versus reverse.

Sory, that it took me some time to do the tests on my side as well. We are just moving offices and I didn’t have access to my Jetson(s) for testing.

What I could do right now is testing the network throughput between a TX1 on its dev board and a normal Linux PC. The Jetson has a fresh JetPack 3.2 b196 installation with jetson_cloks.sh started.

I used the command as specified:

iperf3 -c <server> -u -b 0 -l 16K (default length is 8K) -t 120 -i 1

Here are the results

Jetson TX1 to Linux PC

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-120.00 sec  13.1 GBytes   938 Mbits/sec  0.116 ms  708/859119 (0.082%)
[  4] Sent 859119 datagrams
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-120.03 sec  13.1 GBytes   938 Mbits/sec  0.116 ms  708/859119 (0.082%)
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

Linux PC to Jetson TX1

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-120.00 sec  13.3 GBytes   952 Mbits/sec  0.238 ms  19210/871937 (2.2%)
[  4] Sent 871937 datagrams

iperf Done.
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-120.02 sec  13.3 GBytes   952 Mbits/sec  0.238 ms  19210/871937 (2.2%)
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

So from the point of view of the bandwidth for this particular communication line it looks like it should have always looked! 8)

But before I trust the whole thing completely I would like to test it with the different setups I used before, which at the moment are still in moving boxes. The relatively high number of lost datagrams are still slightly concerning to me. Between my Mac (High Sierra) and said Linux PC there are no packet losses at all.

Before and after a test be sure to check if “ifconfig” has any errors, dropped, overruns, frame issues, or collisions. You’ll want to check that from each device end.

Sorry, I didn’t write the explicitly, but bith systems started fresh and didn’t had any errors shown in ifconfig and no specific output in dmesg…

Did you had the lost datagrams on your tests? Or is that limited of my setup?

On earlier testing there were times I had dropped packets, but only the PC knew the packets were dropped (I believe those issues went away on R28.2). Whenever testing though you need to know those other issues did not occur…if they did, then there is a network issue which isn’t part of the host or Jetson.

Sorry that I wasn’t clear about it before. I actually used the exact same cable. So first I connected the Jetson and did the test and then I connected my MacBook and did the test.

I notice we didn’t have lost datagrams number in our iperf3 result. Seems we forgot to test this …

linuxdev,
Do you see datagram drop in your device too? I mean iperf3 result instead of ifconfig.