How to enable flow control of eqos driver for 1GbE J14(3&4)

Dear @SivaRamaKrishnaNV,
Thank you for you response.

When we sent many large UDP packets from Xavier A and B at the same time, Fragment reassembly timeout occurs (WireShark shows “ICMP Time-to-live exceeded (Fragment reassembly time exceeded)” frequently).
This doesn’t occur when the total amount of packets are sent from only one of two Xaviers.
We think we may be able to prevent fragment reassembly timeout if we enable flow control of 802.3x.
Or if there is the other way to prevent fragment reassembly timeout, we want to know that.

Dear @Tabito.Suzuki ,
My apologies for late reply. SJA1105 switch (to which both Tegra are connected) does not support PAUSE flow control.
The only option would be limiting the number of frames via AVB features – i.e., rate limiting or bandwidth reservation. However to understand better if/how this can be done, could you answer below items.

  • The traffic from Tegra A to Tegra B includes E3579 dongle traffic as well?

  • You have the problem only if you send Tegra A → B and Tegra B → A simultaneously?

  • Could you share any steps to reproduce the issue on our end

Dear @SivaRamaKrishnaNV ,
We are sorry for causing confusion.
We haven’t tried sending from a Xavier to the other Xavier.
We tried 2 cases as follows.

  1. Each Xavier sends to one LinuxPC at the same time.
    Xavier A → (1GbE [3|4] port → E3579 ETHERNET DONGLE) → LinuxPC A
    Xavier B → (1GbE [3|4] port → E3579 ETHERNET DONGLE) → LinuxPC A

  2. Only Xavier A(or B) sends to one LinuxPC.
    Xavier A → (1GbE [3|4] port → E3579 ETHERNET DONGLE) → LinuxPC A

Total amount of packets of case 2 is the same as that of case 1.
Fragment reassembly timeout occurred only in the case of 1.

We think we may be able to prevent fragment reassembly timeout if we enable flow control of 802.3x in the case of 1.

Dear @SivaRamaKrishnaNV

I’m sorry for sending you so many questions in a row.

Would you tell me how to change MTU of Xavier eth0?

We executed following command on Xavier.
$ sudo ip link set eth0 mtu 9000
But MTU didn’t change. Following message appeared instead.
> RTNETLINK answers: Invalid argument

Hi @Tabito.Suzuki ,

Please create another topic (linked to this topic if you think it’s related) for this. Thanks.

Dear @VickNV
I Understood and created new topic about Changing MTU of Xavier eth0.

Dear @Tabito.Suzuki ,
Is there a real use case behind this? How much critical? You may share the details via private message if can not shared on forum.

We need more info about the nature of traffic like burst period, burst length, frame size etc… It would be good if you can provide Info about utilities used to generate the traffic. We would like simulate the scenario on our end.

Can you experiment with decreasing per Tegra BW or burstiness further? Or at least look for UDP packet drops for the two-tegra case vs. one?

Dear @SivaRamaKrishnaNV ,

Our problem can be reproduced with following commands.

[**** Preparation ****]

* We launched 2 terminals on Host PC.

* We executed "iperf3 -s" on Host PC TerminalA with port 5201.
    <TerminalA>
    ---------------------------------------------------------
    $ iperf3 -s -p 5201
    ---------------------------------------------------------

* We executed "iperf3 -s" on Host PC TerminalB with port 5202.
    <TerminalB>
    ---------------------------------------------------------
    $ iperf3 -s -p 5202
    ---------------------------------------------------------


[**** Packet loss occured, when we sent Packets 256Mbps from XavierA and 256Mbps from XavierB to HOST by iperf3 ****]

 +---------+                              +------------+
 | XavierA |-----[iperf3 -c -u]--256Mbps->|            |
 +---------+                              |            |
                                          |  HOST PC   |
 +---------+                              |            |
 | XavierB |-----[iperf3 -c -u]--256Mbps->|            |
 +---------+                              +------------+


* We sent packets 256Mbps from XavierA to HOST by iperf3
    <XavierA>
    ---------------------------------------------------------
    $ iperf3 -c XXX.XXX.XXX.XXX -u -p 5201 -b 268435456
    ---------------------------------------------------------
* At the same time, We sent packets 256Mbps from XavierB to Host by iperf3
    <XavierB>
    ---------------------------------------------------------
    $ iperf3 -c XXX.XXX.XXX.XXX -u -p 5202 -b 268435456
    ---------------------------------------------------------

* Packet loss occured on both XavierA and XavierB. 
    <XavierA>
    ---------------------------------------------------------
    Connecting to host XXX.XXX.XXX.XXX, port 5201
    [  4] local YYY.YYY.YYY.YYY port 54264 connected to XXX.XXX.XXX.XXX port 5201
    [ ID] Interval           Transfer     Bandwidth       Total Datagrams
    [  4]   0.00-1.00   sec  29.9 MBytes   251 Mbits/sec  3828
    [  4]   1.00-2.00   sec  32.1 MBytes   269 Mbits/sec  4107
    [  4]   2.00-3.00   sec  31.9 MBytes   267 Mbits/sec  4078
    [  4]   3.00-4.00   sec  32.5 MBytes   272 Mbits/sec  4156
    [  4]   4.00-5.00   sec  31.7 MBytes   266 Mbits/sec  4058
    [  4]   5.00-6.00   sec  31.7 MBytes   266 Mbits/sec  4060
    [  4]   6.00-7.00   sec  32.2 MBytes   270 Mbits/sec  4125
    [  4]   7.00-8.00   sec  31.7 MBytes   266 Mbits/sec  4052
    [  4]   8.00-9.00   sec  32.0 MBytes   268 Mbits/sec  4090
    [  4]   9.00-10.00  sec  32.2 MBytes   270 Mbits/sec  4116
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
    [  4]   0.00-10.00  sec   318 MBytes   267 Mbits/sec  0.096 ms  4077/40669 (10%)
    [  4] Sent 40669 datagrams

    iperf Done.
    ---------------------------------------------------------

    <XavierB>
    ---------------------------------------------------------
    Connecting to host XXX.XXX.XXX.XXX, port 5202
    [  4] local ZZZ.ZZZ.ZZZ.ZZZ port 33207 connected to XXX.XXX.XXX.XXX port 5202
    [ ID] Interval           Transfer     Bandwidth       Total Datagrams
    [  4]   0.00-1.00   sec  29.7 MBytes   249 Mbits/sec  3796
    [  4]   1.00-2.00   sec  32.0 MBytes   269 Mbits/sec  4101
    [  4]   2.00-3.00   sec  32.0 MBytes   268 Mbits/sec  4094
    [  4]   3.00-4.00   sec  32.6 MBytes   274 Mbits/sec  4176
    [  4]   4.00-5.00   sec  31.3 MBytes   263 Mbits/sec  4009
    [  4]   5.00-6.00   sec  32.6 MBytes   274 Mbits/sec  4177
    [  4]   6.00-7.00   sec  31.7 MBytes   266 Mbits/sec  4063
    [  4]   7.00-8.00   sec  31.5 MBytes   264 Mbits/sec  4031
    [  4]   8.00-9.00   sec  31.9 MBytes   268 Mbits/sec  4087
    [  4]   9.00-10.00  sec  32.7 MBytes   275 Mbits/sec  4189
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
    [  4]   0.00-10.00  sec   318 MBytes   267 Mbits/sec  0.089 ms  4638/40722 (11%)
    ---------------------------------------------------------


[**** Packet Lost didn't occur, when we sent Packets 512Mbps from XavierA to HOST by iperf3 ****]

 +---------+                              +------------+
 | XavierA |-----[iperf3 -c -u]--512Mbps->|            |
 +---------+                              |            |
                                          |  HOST PC   |
                                          |            |
                                          |            |
                                          +------------+

* We sent packets 512Mbps from XavierA to HOST by iperf3
    <XavierA>
    ---------------------------------------------------------
    $ iperf3 -c XXX.XXX.XXX.XXX -u -p 5201 -b 536870912
    ---------------------------------------------------------

* Packet loss didn't occur. 
    <XavierA>
    ---------------------------------------------------------
    Connecting to host XXX.XXX.XXX.XXX, port 5201
    [  4] local YYY.YYY.YYY.YYY port 56899 connected to XXX.XXX.XXX.XXX port 5201
    [ ID] Interval           Transfer     Bandwidth       Total Datagrams
    [  4]   0.00-1.00   sec  61.8 MBytes   518 Mbits/sec  7910
    [  4]   1.00-2.00   sec  65.6 MBytes   550 Mbits/sec  8393
    [  4]   2.00-3.00   sec  64.1 MBytes   538 Mbits/sec  8207
    [  4]   3.00-4.00   sec  63.7 MBytes   534 Mbits/sec  8150
    [  4]   4.00-5.00   sec  64.5 MBytes   541 Mbits/sec  8259
    [  4]   5.00-6.00   sec  60.8 MBytes   510 Mbits/sec  7781
    [  4]   6.00-7.00   sec  65.4 MBytes   549 Mbits/sec  8376
    [  4]   7.00-8.00   sec  65.5 MBytes   549 Mbits/sec  8378
    [  4]   8.00-9.00   sec  62.0 MBytes   520 Mbits/sec  7935
    [  4]   9.00-10.00  sec  66.4 MBytes   557 Mbits/sec  8503
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
    [  4]   0.00-10.00  sec   640 MBytes   537 Mbits/sec  0.086 ms  0/81890 (0%)
    [  4] Sent 81890 datagrams

    iperf Done.
    ---------------------------------------------------------

Dear @SivaRamaKrishnaNV ,

This problem is very critical for us.

We cannot offload operations from XavierA to XavierB.
We use DDS and use bandwidth lower than 500Mbps through 1GbE port.
We think our system must be guaranteed to work according to Drive AGX spec, but it doesn’t work because of the UDP packet loss issue.

Dear @Tabito.Suzuki ,
This is a switch config issue with the older Aurix FW. The current FW sets up networking between the switches that provides better performance/less interference on the outbound paths. Could you update to DRIVE OS 5.2.0 release and check the issue?

Dear @SivaRamaKrishnaNV
We are sorry for our late response to your comment.

We update Drive OS to 5.2.But the same problem was reproduced.The problem haven’t occur on your AGX?

We attached the result of network test by iperf3.

Dear @SivaRamaKrishnaNV

Could you confirm the result of network test on Drive OS 5.2. ?
Is there any fix or workaround ?
I am looking forward for your immediate reply.

Dear @Tabito.Suzuki,
My apologies for missing your update.
Host A and Host B refers to different terminals on same host?

Dear @SivaRamaKrishnaNV

Host A and Host B refers to different terminals on same host?

No. Different terminals on “different” hosts.

Dear @SivaRamaKrishnaNV
What is the status of this issue ?
I need your information in order to resolve paket drops in network.

Dear @Tabito.Suzuki,
I have escalated this issue to engineering team and waiting for an update. I will get back to you once I hear from them.

1 Like

Dear @SivaRamaKrishnaNV
Was there a reply from engineering team? I haven’t found any workaround.

Dear @Tabito.Suzuki,
Could you check with DRIVE OS 5.2.6 release? Also, could you check the stats with iperf?

@SivaRamaKrishnaNV

I haven’t found any perfomance improvement at J14(3&4) in DRIVE OS 5.2.6 release notes(https://developer.nvidia.com/drive/documentation).
Do you mean this issue has been fixed in 5.2.6? Did you check the stats with iperf?

We can’t update os version easily.

Dear @Tabito.Suzuki,
We have verified on DRIVE OS 5.2.6 and notice 20% packet drop at Tegra A and 2% drop at Tegra B We sent packets 256Mbps from XavierA/Xavier B to Host