TX2 ethernet very high latency when connected to switch, such as RTL8309sb

I tried to connect TX2 to our switch device with the chip RTL8309sb. But when ping other device or PC on the switch, the latency was very high, hundreds ms, even up to 1000ms. As comparing, when connected by a dlink router or direct-connect with the PC, the pin latency is under 1ms.
My questions are:

  1. This looks a ompatibility issue. Is it a software issue? Or hardware issue?
  2. For project requirement, I could not replace the switcher with other device. What could I do to resolve this issue?

Anyone dealed with issue like this?
Thanks for any directions.

I am unsure of your topology. In a case where it is slow, what is the output of “ifconfig” (I’m thinking of errors, collisions, overflows, underruns, so on)? What do you see from the “route” command (perhaps the route being taken isn’t what you expect)?

Are there any messages via “dmesg --follow” while you attempt to use the network?

Do try with performance maximized:

sudo nvpmodel -m 0
sudo ~ubuntu/jetson_clocks.sh

linuxdev,

Get some info for your suggestions as below. Please help to check. Thanks!

---- max performance ----
My performance is maximized with 6 cpu cores & max power, proved by jetson_clocks.sh. See below:
$ sudo ./jetson_clocks.sh --show
SOC family:tegra186 Machine:quill
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu1: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu2: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu3: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu4: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu5: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
GPU MinFreq=1300500000 MaxFreq=1300500000 CurrentFreq=1300500000
EMC MinFreq=40800000 MaxFreq=1866000000 CurrentFreq=1866000000 FreqOverride=1

---- “route” looks OK ----
nvidia@tegra-ubuntu:~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
link-local * 255.255.0.0 U 1000 0 0 eth0
192.168.1.0 * 255.255.255.0 U 100 0 0 eth0

---- ifconfig info: dropped RX pkts ----
“ifconfig” shows increasing dropped RX packets, and it’s the only error. See below detail info:
nvidia@tegra-ubuntu:~$ date
Mon Oct 22 03:16:45 UTC 2018
nvidia@tegra-ubuntu:~$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:04:4b:a5:a2:31
inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::f2d4:57fd:7e91:c927/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1584 errors:0 dropped:416 overruns:0 frame:0
TX packets:1085 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:124090 (124.0 KB) TX bytes:151536 (151.5 KB)
Interrupt:42

nvidia@tegra-ubuntu:~$ date
Mon Oct 22 03:16:49 UTC 2018
nvidia@tegra-ubuntu:~$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:04:4b:a5:a2:31
inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::f2d4:57fd:7e91:c927/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1621 errors:0 dropped:420 overruns:0 frame:0
TX packets:1113 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:127100 (127.1 KB) TX bytes:155772 (155.7 KB)
Interrupt:42

nvidia@tegra-ubuntu:~$ date
Mon Oct 22 03:16:53 UTC 2018
nvidia@tegra-ubuntu:~$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:04:4b:a5:a2:31
inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::f2d4:57fd:7e91:c927/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1659 errors:0 dropped:424 overruns:0 frame:0
TX packets:1142 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:130176 (130.1 KB) TX bytes:160226 (160.2 KB)
Interrupt:42

nvidia@tegra-ubuntu:~$ date
Mon Oct 22 03:16:57 UTC 2018

---- “dmesg --follow”: some pci0 issue, BUT no new message during “ping”. ----
[ 8.265629] dhd_preinit_ioctls pspretend_threshold for HostAPD failed -23
[ 8.269822] Firmware version = wl0: May 4 2017 13:48:00 version 7.35.221.21 (r697384) FWID 01-58d9d0b3
[ 8.271997] dhd_interworking_enable: failed to set WNM info, ret=-23
[ 8.272167] tegra_sysfs_on
[ 8.287465] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 8.320431] usb 1-2.1: New USB device found, idVendor=05e3, idProduct=0606
[ 8.331993] usb 1-2.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 8.339473] usb 1-2.1: Product: USB Hub 2.0
[ 8.339475] usb 1-2.1: Manufacturer: ALCOR
[ 8.339819] usb 1-2.1: ep 0x81 - rounding interval to 1024 microframes, ep desc says 2040 microframes
[ 8.340468] hub 1-2.1:1.0: USB hub found
[ 8.340596] hub 1-2.1:1.0: 4 ports detected
[ 8.404274] CFGP2P-ERROR) wl_cfgp2p_add_p2p_disc_if : P2P interface registered
[ 8.428942] usb 1-3.2: new high-speed USB device number 5 using xhci-tegra
[ 8.442372] WLC_E_IF: NO_IF set, event Ignored
[ 8.525893] usb 1-3.2: New USB device found, idVendor=0bda, idProduct=8153
[ 8.532836] usb 1-3.2: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[ 8.540232] usb 1-3.2: Product: USB 10/100/1000 LAN
[ 8.545166] usb 1-3.2: Manufacturer: Realtek
[ 8.549494] usb 1-3.2: SerialNumber: 000000000000
[ 8.555813] usb 1-3.2: rejected 2 configurations due to insufficient available bus power
[ 8.564031] usb 1-3.2: no configuration chosen from 2 choices
[ 8.640835] usb 1-2.1.4: new low-speed USB device number 6 using xhci-tegra
[ 8.742067] usb 1-2.1.4: New USB device found, idVendor=1a2c, idProduct=0b2a
[ 8.749170] usb 1-2.1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 8.756821] usb 1-2.1.4: Product: USB Keyboard
[ 8.761299] usb 1-2.1.4: Manufacturer: SEM
[ 8.767134] usb 1-2.1.4: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
[ 8.776558] usb 1-2.1.4: ep 0x82 - rounding interval to 64 microframes, ep desc says 80 microframes
[ 8.785644] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 8.795875] input: SEM USB Keyboard as /devices/3530000.xhci/usb1/1-2/1-2.1/1-2.1.4/1-2.1.4:1.0/0003:1A2C:0B2A.0001/input/input4
[ 8.861222] hid-generic 0003:1A2C:0B2A.0001: input,hidraw0: USB HID v1.10 Keyboard [SEM USB Keyboard] on usb-3530000.xhci-2.1.4/input0
[ 8.877111] input: SEM USB Keyboard as /devices/3530000.xhci/usb1/1-2/1-2.1/1-2.1.4/1-2.1.4:1.1/0003:1A2C:0B2A.0002/input/input5
[ 8.945676] hid-generic 0003:1A2C:0B2A.0002: input,hidraw1: USB HID v1.10 Device [SEM USB Keyboard] on usb-3530000.xhci-2.1.4/input1
[ 8.996661] fuse init (API version 7.23)
[ 9.244833] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 9.253198] tegra-pcie 10003000.pcie-controller: link 2 down, ignoring
[ 9.259811] tegra-pcie 10003000.pcie-controller: PCIE: no end points detected
[ 9.267369] tegra-pcie 10003000.pcie-controller: PCIE: Disable power rails
[ 9.602339] xhci-tegra 3530000.xhci: tegra_xhci_mbox_work mailbox command 5
[ 9.609435] xhci-tegra 3530000.xhci: tegra_xhci_mbox_work ignore firmware MBOX_CMD_DEC_SSPI_CLOCK request
[ 10.496136] eqos 2490000.ether_qos eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 10.505292] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 17.646438] gk20a 17000000.gp10b: railgate is disabled.

You don’t have a default route, which might be ok for your situation if there is no internet access and nothing outside the local LAN. Any attempt to reach anything not directly on the LAN would always fail or do something unpredictable.

eth0 has overruns, which is far from normal. Possibly because traffic needs somewhere to go and does not have any default route? Not sure. Then there are actual errors, and more overruns.

I see notes on USB which is not clear as to why:

[ 8.555813] usb 1-3.2: rejected 2 configurations due to insufficient available bus power

How are IP addresses being assigned? The lack of a default route seems to be more of a software setup. The power issue on USB might depend on what is connected to it. Can you describe what the USB setup is so far as HUBs and USB network devices? Also, the output of “lsusb -t”.

The IP address is static, and is only used in this local LAN.
There is no dhcp server nor default gateway in this LAN. But TX2 can get correct arp info.

This ethernet port is made from TX2’s original GE port, not a usb network device. May it cause the latency issue?

About the usb, I tried to enable all 3 usb3.0 port, but now only one enabled. The other 2 usb3 ports have duplex issue not resolevd. As to the “so far” problem, we connected a usb3.0 hub and a usb2.0 hub on the carry-board to expand usb ports. Here’s the info:
nvidia@tegra-ubuntu:~$ lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/3p, 5000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/4p, 480M
|__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/4p, 12M
|__ Port 1: Dev 8, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 4: Dev 7, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 4: Dev 7, If 1, Class=Human Interface Device, Driver=, 1.5M
|__ Port 3: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M

Sorry there is some format issue, re-post lsub -t here:
nvidia@tegra-ubuntu:~$ lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/3p, 5000M
…|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/4p, 480M
…|__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
…|__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/4p, 12M
…|__ Port 1: Dev 8, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
…|__ Port 4: Dev 7, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
…|__ Port 4: Dev 7, If 1, Class=Human Interface Device, Driver=, 1.5M
…|__ Port 3: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M

The “12M” device is a external hub, used for keyboard & mouse.

It looks the “overruns” field not changed. Why says “eth0 has overruns, which is far from normal”?

There are 2 ssh to the TX2 & 1 ping from the Tx2. Could they make drop or overruns?

FYI, on an existing post with lsusb, you can hover the mouse over the quote icon in the upper right and see a pencil icon show up. The pencil icon allows you to edit your post. If you edit, highlight the lsusb output, and then click on the “code” icon (looks like “</>”), then it should set it up to maintain whitespace and add a scroll bar for easier viewing (especially the tree view).

The ssh connections would have very little effect unless they are pipes to something producing/consuming a lot of data. Even then TCP should probably deal with it. Ping (ICMP), unless it is a “flood” ping, will have virtually no effect. I can flood ping any of my Jetsons from another Jetson or my host and never see drops and overruns. Drops and overruns are unusual even on a congested network. I believe something is wrong with the network configuration, but I can’t quite place it. Somewhere in that manual setup is probably some subtle issue which isn’t an outright failure, but which interacts in unexpected ways.

The “12M” is valid then, this is USB1.1 speed and common for mouse/keyboard. I actually have a USB1.1 HUB not capable of faster speeds just because I need to use it with a slower USB analyzer. There is no chance of this causing any kind of significant load so long as gigabit isn’t being bridged through it.

I am thinking that perhaps something in other ethernet interfaces might be interacting. There is a demo of the gadget USB interface being used to create a simulated network device, and that device is probably configured for the same address in the same subnet on all Jetsons and in some way consuming resources. What is the full (not just eth0) and exact “route”, “arp”, and “ifconfig” output on all Jetsons and host when running at the same time and seeing the latency issues?

FYI, the TX1 has the integrated ethernet going through a USB controller, and the TX2 uses a more conventional connect. For the sake of testing, at least for now, make sure USB autosuspend is disabled in the TX1. To do so, in the “/boot/extlinux/extlinux.conf” file, look for the “APPEND” key/value pair. Simply add “usbcore.autosuspend=-1” to the end. It’ll look something like this:

...
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait <i><b>usbcore.autosuspend=-1</b></i>

I tried adding “usbcore.autosuspend=-1”. But no effect to the network latency issue.
---- Here is my modified extlinux.conf ----
nvidia@tegra-ubuntu:~$ cat /boot/extlinux/extlinux.conf
TIMEOUT 30
DEFAULT primary

MENU TITLE p2771-0000 eMMC boot options

LABEL primary
MENU LABEL primary kernel
LINUX /boot/Image
APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 usbcore.autosuspend=-1
nvidia@tegra-ubuntu:~$

---- Adding ifconfig all info: ----
nvidia@tegra-ubuntu:~$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:04:4b:a5:a2:31
inet addr:192.168.1.8 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::f2d4:57fd:7e91:c927/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:44275 errors:0 dropped:14855 overruns:0 frame:0
TX packets:29325 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3305015 (3.3 MB) TX bytes:4570828 (4.5 MB)
Interrupt:42

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:78720 errors:0 dropped:0 overruns:0 frame:0
TX packets:78720 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:5825280 (5.8 MB) TX bytes:5825280 (5.8 MB)

wlan0 Link encap:Ethernet HWaddr 00:04:4b:a5:a2:2f
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

nvidia@tegra-ubuntu:~$

Can you provide the full “ifconfig” output for all of the hosts on the network (I see one host above, this is probably the Jetson)? This implies more than just eth0 since it would be useful to see what other interfaces might overlap. Also the “route” and “arp” command output alongside the “ifconfig” output. I suspect it is a configuration problem, but looking at one host interface at a time probably won’t provide sufficient information (this is probably an interaction among interfaces and interfaces are on multiple hosts).

If you have WiFi on, then also the “iwconfig” output.

EDIT: Forgot to ask…do either of these systems have firewalling enabled? Typically the Jetson wouldn’t, but often PC hosts do.