When we sent many large UDP packets from Xavier A and B at the same time, Fragment reassembly timeout occurs (WireShark shows “ICMP Time-to-live exceeded (Fragment reassembly time exceeded)” frequently).
This doesn’t occur when the total amount of packets are sent from only one of two Xaviers.
We think we may be able to prevent fragment reassembly timeout if we enable flow control of 802.3x.
Or if there is the other way to prevent fragment reassembly timeout, we want to know that.
Dear @Tabito.Suzuki ,
My apologies for late reply. SJA1105 switch (to which both Tegra are connected) does not support PAUSE flow control.
The only option would be limiting the number of frames via AVB features – i.e., rate limiting or bandwidth reservation. However to understand better if/how this can be done, could you answer below items.
The traffic from Tegra A to Tegra B includes E3579 dongle traffic as well?
You have the problem only if you send Tegra A → B and Tegra B → A simultaneously?
Could you share any steps to reproduce the issue on our end
Dear @SivaRamaKrishnaNV ,
We are sorry for causing confusion.
We haven’t tried sending from a Xavier to the other Xavier.
We tried 2 cases as follows.
Each Xavier sends to one LinuxPC at the same time.
Xavier A → (1GbE [3|4] port → E3579 ETHERNET DONGLE) → LinuxPC A
Xavier B → (1GbE [3|4] port → E3579 ETHERNET DONGLE) → LinuxPC A
Only Xavier A(or B) sends to one LinuxPC.
Xavier A → (1GbE [3|4] port → E3579 ETHERNET DONGLE) → LinuxPC A
Total amount of packets of case 2 is the same as that of case 1.
Fragment reassembly timeout occurred only in the case of 1.
We think we may be able to prevent fragment reassembly timeout if we enable flow control of 802.3x in the case of 1.
I’m sorry for sending you so many questions in a row.
Would you tell me how to change MTU of Xavier eth0?
We executed following command on Xavier.
$ sudo ip link set eth0 mtu 9000
But MTU didn’t change. Following message appeared instead.
> RTNETLINK answers: Invalid argument
Dear @Tabito.Suzuki ,
Is there a real use case behind this? How much critical? You may share the details via private message if can not shared on forum.
We need more info about the nature of traffic like burst period, burst length, frame size etc… It would be good if you can provide Info about utilities used to generate the traffic. We would like simulate the scenario on our end.
Can you experiment with decreasing per Tegra BW or burstiness further? Or at least look for UDP packet drops for the two-tegra case vs. one?
We cannot offload operations from XavierA to XavierB.
We use DDS and use bandwidth lower than 500Mbps through 1GbE port.
We think our system must be guaranteed to work according to Drive AGX spec, but it doesn’t work because of the UDP packet loss issue.
Dear @Tabito.Suzuki ,
This is a switch config issue with the older Aurix FW. The current FW sets up networking between the switches that provides better performance/less interference on the outbound paths. Could you update to DRIVE OS 5.2.0 release and check the issue?
I haven’t found any perfomance improvement at J14(3&4) in DRIVE OS 5.2.6 release notes(https://developer.nvidia.com/drive/documentation).
Do you mean this issue has been fixed in 5.2.6? Did you check the stats with iperf?
Dear @Tabito.Suzuki,
We have verified on DRIVE OS 5.2.6 and notice 20% packet drop at Tegra A and 2% drop at Tegra B We sent packets 256Mbps from XavierA/Xavier B to Host