I’m not experiencing bad issues because of that, but I’ve noticed that with L4T versions from R32.7, jetsons report dropped RX packets from eth0. This seems weird to me so this topic for reporting some data.
The issue is just dropped RX packets reported from ifconfig:
ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1466
inet 192.168.1.15 netmask 255.255.255.0 broadcast 192.168.1.255
ether 48:b0:2d:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 76815 bytes 83953752 (83.9 MB)
RX errors 0 dropped 359 overruns 0 frame 0
TX packets 26366 bytes 5455002 (5.4 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
1. On a first network, I have:
- a host PC running Ubuntu (18 or 20, not sure), not showing any dropped packet.
- AGX Xavier running R32.6.1. It doesn’t show the issue. No dropped packet.
- AGX Orin running R35.1. It shows the issue. RX dropped packet at rate of about 1 drop each 10 seconds.
- XavierNX running R35.2. It shows the issue. RX dropped packet at similar rate.
Boosting clocks doesn’t improve for jetsons showing the issue.
Using :
for i in `seq 300`; do ifconfig eth0 | grep RX | grep dropped | tr -s " " | cut -d' ' -f6 >> RX_drop.csv; sleep 1; done
I’ve logged these drops into a file for 5 minutes. Here are results from AGX Orin:
XavierNX shows similar pattern.
2. On a second network, I have:
As I also noticed that failing versions were reporting MTU of 1466 while working versions were reporting 1500 instead, I’ve tried to set MTU and went to this workaround , however adjusting the MTU doesn’t change the issue.
So it seems to me that recent L4T releases since R32.7 may have an issue with ethernet, and the number of dropped frames may be related to the network itself.
I’ve tried disabling may things to no avail. I’ve also ruled out cable issues swapping these to no difference… Seems only related to L4T version.
In case NVIDIA would not know about this or fail to reproduce, I’m ok for running any experiment and share results.
Hi,
Thanks for reporting this. We will start the experiment with Jetson Orin on our side first.
Hi @Honey_Patouceul
One question here. Have you tested AGX Xavier + r32.7.3? It seems it would not have drop frames issue on my side.
Hi @WayneWWW
No, not tested that, my only AGX Xavier is running R32.6. I may try to upgrade to R32.7 only if this is important.
BTW, did you observe the issue with Orin and JP5 ?
Yes, I can see this on Orin + jp5. Also want to know if this is really a regression since specific time at rel32.
Hi @Honey_Patouceul
May I ask what kind of environment you are testing on your side? I’ve checked, if jetson is connected to another jetson back to back, then it will not have packet drop.
If it is a more complicated environment (for example, company network as mine), then it seems this issue would happen.
Hi @WayneWWW,
Yes, as said before it seems that it depends on network.
In first case the network is just an optic fiber internet box. Even if only one device is connected to the box (wifi turned off), the issue happens at constant rate with recent L4T releases.
In the second case, it is a company network and the rate is much higher. However, the XavierNX is connected to a mini switch, only the Jetson and a Windows PC are behind this switch.
May be using a network probe could suggest what kind of frame is dropped.
Also noticed that the number of dropped frames may have a sudden increase after changing MTU size on Orin (up or down doesn’t improve, not found any MTU size not showing the periodic issue), if this can help to identify root cause.
Is it possible to see the MTU on the Windows PC, in combination with what the MTU is set to on the Jetson (to see if the MTUs are paired the same)? Also, is it possible to test with a burst of (A) TCP traffic (which is subject to reordering and reassembly upon fragmentation), and to compare to (B) UDP burst…the goal of which is to see if TCP still reassembles correctly. If TCP does reassemble, then retransmits are doing their job, but if not, then there is a more serious problem.
I tested in the first network with only AGX Orin (MTU1500) and a Ubuntu PC (MTU1500).
The burst experiments didn’t show anything wrong.
However, I noticed running wireshark on the PC that the drops on Orin happen when there is a HomePlug or HomePlug AV broadcast frame.
Surprizingly, if I run wireshark on Orin, then there are no longer drops, the HomePlug frames are received. If I stop wireshark, then the drops happen again.
I’m not familiar with HomePlug. However, you mentioned “broadcast”. Is this just a broadcast address? If so, then the output of “route
” might matter (what address was this)? Also, if it is something like multicast, then this too would have its own issues (and the router and both sides of the connection would have to be set up to allow and support multicast).
I don’t know more about HomePlug.
Yes, it is a broadcast address at MAC level. See details of such a frame here:
Frame 42: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on interface eth0, id 0
Interface id: 0 (eth0)
Interface name: eth0
Encapsulation type: Ethernet (1)
Arrival Time: Apr 15, 2023 17:52:49.220419212 CEST
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1681573969.220419212 seconds
[Time delta from previous captured frame: 0.398506060 seconds]
[Time delta from previous displayed frame: 0.398506060 seconds]
[Time since reference or first frame: 21.902331933 seconds]
Frame Number: 42
Frame Length: 60 bytes (480 bits)
Capture Length: 60 bytes (480 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ethertype:homeplug]
[Coloring Rule Name: Broadcast]
[Coloring Rule String: eth[0] & 1]
Ethernet II, Src: 60:8d:26:xx:xx:xx (60:8d:26:xx:xx:xx), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Destination: Broadcast (ff:ff:ff:ff:ff:ff)
Address: Broadcast (ff:ff:ff:ff:ff:ff)
.... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
.... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
Source: 60:8d:26:xx:xx:xx (60:8d:26:xx:xx:xx)
Address: 60:8d:26:xx:xx:xx (60:8d:26:xx:xx:xx)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Type: Homeplug (0x887b)
HomePlug protocol
MAC Control Field
0... .... = Reserved: 0
.000 0001 = Number of MAC Data Entries: 1
MAC Management Entry Header
000. .... = MAC Entry Version: 0
...0 0010 = MAC Entry Type: Vendor Specific (0x02)
MAC Management Entry Length: 4
Vendor Specific
OUI: 0x000487
..01 0000 = Message ID: 16
0... .... = Direction: 0
..01 0000 = Message ID: 16
I don’t think it is related to routing because no frame is dropped when wireshark is running on Orin.
My guess is that without wireshark, nothing handles the HomePlug frames so these may be dropped.
I don’t know much about multicast, but it does have more requirements and it isn’t unusual that some router or endpoint lacks support. On the other hand, when you run wireshark, I think the host has to go into “promiscuous mode”. I don’t know the details of security for multicast, but if the application works in promiscuous mode, but not without it, then perhaps it is a security setting or permission issue. Reworded, perhaps it is about how multicast is bridged or forwarded (configuration) that is getting in the way.
Thanks @Linuxdev for letting me know about promiscuous mode, it explains a lot of what I’ve been seeing with wireshark.
I think that my IP routing is quite standard:
route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default livebox.home 0.0.0.0 UG 100 0 0 eth0
link-local 0.0.0.0 255.255.0.0 U 1000 0 0 docker0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.1.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
but the issue may not be at IP level.
I’m unsure if I can further investigate by myself.
On the company network I’m unsure if the admins will allow me to run wireshark experiments,…
Though, these pieces of information might help NVIDIA to better figure out what is the issue…and further comment.
I wish I could help more. There is likely some sort of configuration change required, but whether it is in the kernel or user space I don’t know. Multicast itself doesn’t follow all of the same rules since it uses broadcast addresses. I just don’t know enough about multicast.
I have been allowed to experiment within company network, but I cannot publish details.
Seems even more complex than I thought… There are no HomePlug frames, but dropped frames are seen, these are broadcast frames.
In this case, running wireshark on XavierNX doesn’t change drop rate (checked promiscuous mode was enabled).
I see that HomePlug is just ethernet over the existing power lines of a building. I don’t think that would make any difference in most ways, although it could certainly be less reliable and subject to loss via noise. The effect of running wireshark and being in promiscuous mode being beneficial though tends to favor that the issue is a software issue rather than an actual noise/quality issue (though that could still be part of it).
The HomePlug broadcast frames are sent by my internet box, but I don’t have any LPC device, so signal quality may not be involved here.
The promiscuous mode helped with AGX Orin in my home network, but this didn’t help with XavierNX on the company network.
Seems more complex than what I can help debugging. Hope that @WayneWWW will comment sooner or later.
We need spend more time debugging this. The internal ticket has been filed.
However, honestly this issue is only low to medium priority for now.
Hi, we’re also seeing this issue, which may make a camera driver unstable with our environment.
Is there any update on this issue?