Need guidance on debugging Jetson AGX USB controller hang with Jetpack 4.6

After upgrading our platform from Jetpack 4.4.1 to Jetpack 4.6 (L4T 32.6.1) we are encountering hangs with the USB controller. These hangs happen randomly when running our application with Jetpack 4.6 and did not happen with Jetpack 4.4.1. They do occur when the system is running MAXN with ‘sudo jetson_clocks’.

What are next steps to further debug this issue on Jetpack 4.6?

I have seen this post ([ Xavier NX ] USB compliance mode hang) which makes reference to Jetpack 4.6.1. Are there USB stability fixes in Jetpack 4.6.1 that we should be testing?

Output from DMESG indicating the failure:

[  +1.956610] tegra-xusb 3610000.xhci: controller firmware hang
[  +0.000120] tegra-xusb 3610000.xhci: hcd_reinit is disabled or in progress
[  +5.673603] tegra-xusb 3610000.xhci: xHCI host not responding to stop endpoint command.
[  +0.000152] tegra-xusb 3610000.xhci: Assuming host is dying, halting host.
[  +0.016147] tegra-xusb 3610000.xhci: Host not halted after 16000 microseconds.
[  +0.000154] tegra-xusb 3610000.xhci: Non-responsive xHCI host is not halting.
[  +0.000114] tegra-xusb 3610000.xhci: Completing active URBs anyway.

A usbmon trace of one of our USB buses shows the following data at the time of the connection drop

ffffffc5af515780 1959641574 S Bo:1:011:1 -115 86 = 01000000 56000000 24000000 2a000000 00000000 00000000 00000000 00000000
ffffffc3b29d1840 1959843134 S Bo:1:011:1 -115 166 = 01000000 a6000000 24000000 7a000000 00000000 00000000 00000000 00000000
ffffffc40e314cc0 1960164881 S Bo:1:011:1 -115 166 = 01000000 a6000000 24000000 7a000000 00000000 00000000 00000000 00000000
ffffffc5acf73cc0 1960220123 S Bo:1:011:1 -115 123 = 01000000 7b000000 24000000 4f000000 00000000 00000000 00000000 00000000
ffffffc3b3f98f00 1960494419 S Bo:1:011:1 -115 166 = 01000000 a6000000 24000000 7a000000 00000000 00000000 00000000 00000000
ffffffc40e314f00 1960668106 S Bo:1:011:1 -115 86 = 01000000 56000000 24000000 2a000000 00000000 00000000 00000000 00000000
ffffffc5acf73900 1960823190 S Bo:1:011:1 -115 166 = 01000000 a6000000 24000000 7a000000 00000000 00000000 00000000 00000000
ffffffc405bd2a80 1961153693 S Bo:1:011:1 -115 166 = 01000000 a6000000 24000000 7a000000 00000000 00000000 00000000 00000000
ffffffc405bd2780 1961475509 S Bo:1:011:1 -115 166 = 01000000 a6000000 24000000 7a000000 00000000 00000000 00000000 00000000
ffffffc7cc004000 1961712720 C Ii:1:002:1 -108:2048 0
ffffffc7d3f09840 1961712750 C Ii:1:003:1 -108:2048 0
ffffffc7d3f3f480 1961712836 C Ii:1:005:1 -108:2048 0
ffffffc7cde28900 1961712885 C Ii:1:007:1 -108:8 0
ffffffc7cde28780 1961712893 C Bi:1:007:2 -108 0
ffffffc7cde28d80 1961712895 C Bi:1:007:2 -108 0
ffffffc7cde28cc0 1961712896 C Bi:1:007:2 -108 0

Hi,

Is this happened on devkit or your custom board?

If this is on custom board, could you test and reproduce issue with devkit?

If this is on devkit, could you fallback to jp4.5.1 and see if issue is there?

This is happening with our custom board. Still investigating.

@WayneWWW We are not able to replicate our USB device structure using a devkit (busses/connections/speeds) to perform equivalent testing.

Is there a way to downgrade the Jetpack 4.6 nvidia-l4t-xusb-firmware to use the Jetpack 4.4.1 firmware so we can determine which component (kernel or opaque firmware) is causing the issue?

Hi,
Please refer to steps in this post:
Nvidia devkit USB3.0 compliance Test on Signal Output in Tektronix - #10 by WayneWWW

Thanks @DaneLLL, would have never found that post by searching. I was able to follow those steps and confirm we are able to downgrade the USB firmware version.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.