USB function hangs when we test USB devices function

Hi~
We face an issue which is USB disconnect when we test USB devices function. Our USB device connection is as the below picture.
We do a lot of experiments in “TX2 USB test experiments_20220311A” file.
Could you help us check this problem??

USB test condition:
UBS30 to LAN → use Iperf to test LAN functiion
BT → Just connect to USB hub, no run any test tool
GPS → system get GPS data continuous
TX2 USB test experiments_20220311A.xlsx (10.6 KB)

[ 1231.178515] tegra-xusb 3530000.xhci: xHCI host not responding to stop endpoint command.
[ 1231.186515] tegra-xusb 3530000.xhci: Assuming host is dying, halting host.
[ 1231.193541] r8152 2-2.2:1.0 eth1: Tx status -108
[ 1231.198160] r8152 2-2.2:1.0 eth1: Tx status -108
[ 1231.202775] r8152 2-2.2:1.0 eth1: Tx status -108
[ 1231.207389] r8152 2-2.2:1.0 eth1: Tx status -108
[ 1231.212086] r8152 2-2.4:1.0 eth2: Tx status -108
[ 1231.216699] r8152 2-2.4:1.0 eth2: Tx status -108
[ 1231.221313] r8152 2-2.4:1.0 eth2: Tx status -108
[ 1231.225927] r8152 2-2.4:1.0 eth2: Tx status -108
[ 1231.230567] Bluetooth: hci0 urb ffffffc0e0db50c0 failed to resubmit (22)
[ 1231.237270] Bluetooth: hci0 urb ffffffc0e0db5180 failed to resubmit (22)
[ 1231.243972] Bluetooth: hci0 urb ffffffc0e0db5840 failed to resubmit (22)
[ 1231.250730] tegra-xusb 3530000.xhci: HC died; cleaning up
[ 1231.256149] tegra-xusb 3530000.xhci: hcd_reinit is disabled or in progress
[ 1231.256160] r8152 2-2.4:1.0 eth2: Tx timeout
[ 1231.260562] usb 1-2: USB disconnect, device number 2
[ 1231.261043] usb 1-3: USB disconnect, device number 3
[ 1231.261049] usb 1-3.1: USB disconnect, device number 4
[ 1231.261368] usb 2-2: USB disconnect, device number 2
[ 1231.261373] usb 2-2.2: USB disconnect, device number 3
[ 1231.265731] usb 1-3.4: USB disconnect, device number 5
[ 1231.266812] option1 ttyUSB0: GSM modem (1-port) converter now disconnected from ttyUSB0
[ 1231.266847] option 1-3.4:1.0: device disconnected
[ 1231.267396] option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1
[ 1231.267420] option 1-3.4:1.1: device disconnected
[ 1231.267722] option1 ttyUSB2: GSM modem (1-port) converter now disconnected from ttyUSB2
[ 1231.267754] option 1-3.4:1.2: device disconnected
[ 1231.268000] option1 ttyUSB3: GSM modem (1-port) converter now disconnected from ttyUSB3
[ 1231.268025] option 1-3.4:1.3: device disconnected
[ 1231.268139] qmi_wwan 1-3.4:1.4 wwan0: unregister ‘qmi_wwan’ usb-3530000.xhci-3.4, WWAN/QMI device
[ 1231.322988] usb 2-2.4: USB disconnect, device number 4

tx2_usb_fail.7z (21.2 KB)

Thanks
Ken

Can you clarify what is “burn in test”?

H~Wayne,
Sorry, I correct my topic and context in what I post yesterday. Please check it again.

Thanks
Ken

Hi,

Could you directly share the commands and if this problem is able to reproduce over the devkit?

Hi~Wayne,
Please refer to the below picture and test script. That is how we reproduce TX2 USB issue on devkit.
I think we design a carrier board for Jetson Nano/Xavier NX/TX2, but this USB issue just happened on Jetson TX2.
Since Xavier devkit doesn’t have a USB20 hub connected to the USB20_2 bus. So if you want to reproduce this issue you need to solder a USB cable to USB20_2 of M.2 E Key then you can connect to the USB20 hub.

USB test condition:
UBS30 to LAN → use Iperf to test LAN functiion
BT → Just connect to USB hub, no run any test tool
GPS → system get GPS data continuous
gpstest.7z (857 Bytes)
network_stress_test.7z (799 Bytes)



Thanks
Ken

So what you are doing here is basically stress test over the GPS device and stress test over the ethernet?

H~Wayne,
Yes, We just stress test over the GPS and the ethernet. But BT device must exist if you want to reproduce this issue.
We do a lot of experiments, you can refer to “TX2 USB test experiments_20220311A” file.

Thanks
Ken

Please share the full dmesg. It is unlikely for us to reproduce this issue on our side.

We need your log to tell what is the next step.

Hi~Wayne,
Please refer to the below attachment. If you have other analytic data that you need to get, please let me know.
tx2_usb_fail.zip (25.0 KB)

Thanks
Ken

Hi,

We just checked your log. The one that initiates the first error log is from

[ 1209.820620] Ethernet stress test : e[0;32mrunninge[0m
[ 1226.058519] NETDEV WATCHDOG: eth2 (r8152): transmit queue 0 timed out

And after that, usb controller gives out error after few seconds later.

[ 1226.079182] r8152 2-2.4:1.0 eth2: Tx timeout
[ 1231.178515] tegra-xusb 3530000.xhci: xHCI host not responding to stop endpoint command.
[ 1231.186515] tegra-xusb 3530000.xhci: Assuming host is dying, halting host.

To precisely debug this issue, we can only suggest you to capture the usb bus trace. If the usb controller sends out some cmd to the eth dongle but it does not respond, then this behavior is expected. For such case, need to consult with the ethernet dongle vendor to check.

Hi~Wayne,

please refer to the attachment with more detail log
dmesg_0314.log (77.2 KB)

asus@tegra-ubuntu:~$ sudo echo 7 > /proc/sys/kernel/printk
[ 482.623863] tegra-xusb 3530000.xhci: controller firmware hang
[ 482.629606] tegra-xusb 3530000.xhci: hcd_reinit is disabled or in progress
[ 484.169337] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out
tegra-xusb controller firmware hang is the earliest

And this issue only can be present on TX2. Xavier can work well under the same test. I think this issue is your SOM problem.

Thanks
Ken

Hi,

I already said in previous comment. This is just assumption.
No matter what kind of case. We need your help to dump the usb bus trace.

Hi Wayne,

Thanks for your reply.

If you mean using a USB traffic sniffer to dump USB packet. This might not be possible. Since stress test need about 10~20 minute to encounter problems. We don’t have any USB sniffer has such big buffer to record the USB packet between 20 minute.

In addition, tegra-xusb controller firmware hang is early to transmit queue 0 timed out and other device is in communication. It looks like the problem is happened at usb controller. Dump the usb bus trace may not helpful.

[ 482.623863] tegra-xusb 3530000.xhci: controller firmware hang
[ 482.629606] tegra-xusb 3530000.xhci: hcd_reinit is disabled or in progress
[ 484.169337] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out

[ 486.780908] r8152 2-2.2:1.0 eth1: Tx status -108
[ 486.785522] r8152 2-2.2:1.0 eth1: Tx status -108
[ 486.813316] Bluetooth: hci0 urb ffffffc0e7716a80 failed to resubmit (108)
[ 486.820106] Bluetooth: hci0 urb ffffffc0e7716900 failed to resubmit (108)
[ 486.826894] Bluetooth: hci0 urb ffffffc0e7716300 failed to resubmit (108)
[ 486.852937] option1 ttyUSB1: usb_wwan_open: submit read urb 0 failed: -19
[ 486.859732] option1 ttyUSB1: usb_wwan_open: submit read urb 1 failed: -19
[ 486.866525] option1 ttyUSB1: usb_wwan_open: submit read urb 2 failed: -19

This issue is very easy to reproduce if anyone connect a usb hub to TX2 usb20_2 port. Maybe you should check this issue with engineer and have a try to reproduce it on your side.

Hi ,

May I ask a summary about what are the exact devices and how many hubs we need to connect here?

Previously, Ken_Lin told us BT connection is required and GPS connection is required too. But now you said “very easy to reproduce if anyone connect a usb hub to TX2 usb20_2 port.”

Also, have you people ever changed any of the usb devices, hub or even that M.2 converter cable? I mean we may not able have the exact same devices (from same vendor) as yours.

If we can provide some debug method to you, for example, debug usb firmware that can dump fw log, could you people help reproduce issue and capture log on your side? This would make the debug process faster.

Hi,

Please use this to dump the diagnostic log first.
diagnostic_client_cu.zip (261.0 KB)

  1. copy diagnostic_client_cu in the attachment to the device
  2. repro the firmware hang issue
  3. ./dianostic_client_cu -a
  4. get the xusb_diag_fw_cu.log file and share it here.

Hi Wayne,

From our test results,

  1. Connect U2 hub to usb20_2 port is required, we can reproduce this issue with two different usb hubs. You can connect a hub to usb20_2 by any method like m.2 to usb card or the way Ken used in previous comment.

  2. Connect a BT adapter to U2 hub which under usb20_2 port is required. We can reproduce this issue with three different BT adapter. If move BT adapter to U2 hub which under usb20_1, the problem will disappear.

  3. Connect a LTE module to U2 hub which under usb20_2 port is required. We can reproduce this issue with two LTE module(EG25G & EG25AU). Use EG25G is easier to reproduce problem. Do GPS test(GPS is on LTE module) will make problem happen sooner.
    If move LTE module to U2 hub which under usb20_1, the problem will disappear.

  4. Connect U3 hub to usb30_0 port is required, we can reproduce this issue with three different usb 3.0 hubs. Since we need produce a heavy traffic by u3 ethernet dongle.

  5. Connect a ethernet dongle to U3 hub and do TX transmission is required. If use two ethernet dongle, the problem can be reproduce faster. We can reproduce this issue with two different ethernet dongle.

The devices I mentioned above are widely used. I mean these things are easy to get to reproduce the issue.

Hi,
Since the design of the custom board is different from our developer kit, we are not able to reproduce the issue. Looks like in your custom board design, USB3-0 and USB2-1are in pair to form a USB3 type-A port, and USB2-2 is a USB2 type-A port. Could you check if the power supply is sufficient to each port? Most issues about instability are mainly from insufficient power supply. There are many devices on the two ports and it is possible power supply is not sufficient. Please check this first.

xusb_diag_fw_cu.log (60.4 KB)
Hi Wayne,

The attachment is diagnostic log.

Hi,

Could you clarify whether this log is really captured “after” you reproduce the error? We just checked but the firmware is not halted.

Hi~Danel,
Our board USB design is as the below picture. USB3_0 and USB2_1 connect to USB30 hub, USB2_2 connect to USB20 hub. Not directly connect a Type-A connector. All power measurement items related to USB are pass.
Power_ripple_Report_20220124B.xlsx (1.5 MB)
USB Type A Droop.zip (7.4 KB)
USB Type A Drop.zip (1.3 KB)

Since this USB issue only happened on Jetson TX2, Nano and Xavier don’t have this problem.
So I think this issue is not related to USB power.

Thanks
Ken