Jetson AGX USB gadget disconnects and reconnects at random

Hi, Ive posted this two months ago but didn’t get any response, any help would be appriciated.

I have the following setup with my jetson AGX Xavier:
My windows PC is connected to the jetson agx via usb (gadget port) and I have an SSD which I connected inside using the m.2 nvme interface mapped to be the backing storage of the mass storage device.

Now I have been running some benchmarking tests using Winsat to see what throughput I can get, and I encountered a recurring bug in which the transfer stops for 20 seconds at random times and then continues as nothing happened.

the command I run on the PC is the following (in cmd admin command prompt):

winsat disk -seq -write -drive E: -seqsize 524288 -v

And this is the output:
Run [1] Type[0x02000001] Zone[0] - 197.169459 MB/s
Run [2] Type[0x02000001] Zone[0] - 22.025154 MB/s
Run [3] Type[0x02000001] Zone[0] - 197.241335 MB/s
Run [4] Type[0x02000001] Zone[0] - 195.392863 MB/s
Run [5] Type[0x02000001] Zone[0] - 197.208295 MB/s
Run [6] Type[0x02000001] Zone[0] - 197.227573 MB/s
Run [7] Type[0x02000001] Zone[0] - 197.226710 MB/s
Run [8] Type[0x02000001] Zone[0] - 22.028424 MB/s

I had tweaked the kernel a bit to debug this problem, adding prints on reads and writes, this is the dmesg I get during the disconnection (when the transfer pace according to Winsat decreases):
[ +0.002897] my_do_write!
[ +0.002813] my_do_write!
[ +20.001554] android_work: send uevent USB_STATE=DISCONNECTED
[ +0.000191] tegra-xudc-new 3550000.xudc: ep 3 disabled
[ +0.000057] tegra-xudc-new 3550000.xudc: ep 2 disabled
[ +0.000224] configfs-gadget gadget: super-speed config #1.c
[ +0.000134] android_work: sent uevent USB_STATE=CONNECTED
[ +0.000070] android_work: sent uevent USB_STATE=CONFIGURED
[ +0.000038] tegra-xudc-new 3550000.xudc: ep 3 (type: 2, dir: in) enabled
[ +0.000018] tegra-xudc-new 3550000.xudc: ep 2 (type: 2, dir: out) enabled
[ +0.003475] my_do_write!
[ +0.001192] my_do_write!

now i opened wireshark on the usb interface and what i see is that a write request goes out at some time and from that point in time for the next 20 seconds
all I see are “URB_INTERRUPT in” messages at 32 and 27 frame size going in and out of the device 100 times a second for roughly 16 seconds around the timeframe the connection drop, as far as I can tell the PC doesn’t drop the connection so I suppose it is probably the jetson.

do you know what might cause this? is there any way to fix it?

thanks a bunch.


some more info:

  • The bug happens in do_write in drivers/usb/gadget/function/f_mass_storage.c
  • It seems like sleep_thread is called, then there is a 20 sec sleep and then raise_exception is called.
  • Then do_set_interface is called and the connection resets.

so I’m not entirely sure what happens, but what I think is that the acknowledge message that the Jetson sends
that lets the host know that it’s ready to receive the next data block somehow doesn’t arrive at the host. Then after 20 seconds the host resets the connection because it has a packet that didn’t get acknowledged for this amount of time.

I’ve looked a little bit into the Tegra UDC controller and I’ve seen that there are some changes in clock speed down, maybe this has something to do with that?

I’m not sure how to proceed with this, any help would be appreciated.

thanks a bunch

Mostly I can’t help on this, but in case this is a simple clock speed issue, make sure that before the test you set max performance:
sudo nvpmodel -m 0

Then see if the failure or the performance improves.

Hi,
The prints show the connection is intermittent:

[ +20.001554] android_work: send uevent USB_STATE=DISCONNECTED
[ +0.000191] tegra-xudc-new 3550000.xudc: ep 3 disabled
[ +0.000057] tegra-xudc-new 3550000.xudc: ep 2 disabled
[ +0.000224] configfs-gadget gadget: super-speed config #1.c
[ +0.000134] android_work: sent uevent USB_STATE=CONNECTED
[ +0.000070] android_work: sent uevent USB_STATE=CONFIGURED

From experience this is probably due to instable power supply. Sometimes the execution drains too much current and the power supply is not stable. Not sure if it help but please disable auto suspend and give it a try:
Unable to permanently turn off autosuspend for a USB device connected to a Xavier NX - #4 by linuxdev

Hi,
Thanks for the quick response.
I tried both your suggestions @DaneLLL @linuxdev , and it did not help :(
The Jetson is connected to the power outlet, is it not stable enough to your knowledge?
Do you have any proposition for a more stable option? or for some other factors that might cause this problem?
I’m kind of lost, any help would be appreciated :)

Thanks again!

Hi,
Please share your release version for reference.
And the use-case is connecting NVMe SSD card to key M slot on Xavier developer kit, and connecting Xavier to a host PC through J512 port. So that Xavier runs in device mode. Is this correct?

Hello, the version we have on the machine is l4t 4.5.1.
As for your suggestion about power supply, I am not sure this is the issue… I’ve debugged the kernel a bit using prints, and I have figured that what might causes it was that the host (windows computer) has a timeout for a reply on a BULK_OUT packet.
As far as I can tell there are 2 options:

  1. The packet from my windows computer doesn’t reach the controller, the windows thinks it does and then the host resets the port after 20 seconds.
  2. The packet is received in the Jetson, but the request for further data is not received on the windows port.

I’m still not sure about how everything works in the driver so this is as much as I could tell at the moment.

Do you manage to reproduce the problem? Is it my particular device or is it a general problem with the Jetsons?

Thanks

Hi,
The use-case is connecting NVMe SSD card to key M slot on Xavier developer kit, and connecting Xavier to a host PC through J512 port. So that Xavier runs in device mode. Is this correct?

We would need a way to set up on Xavier developer kit. If you can observe the issue on developer kit, please share the setup and steps.

And if possible, please try Jetpack 4.6.1.

Yes, this happens to me on the developer kit (haven’t tried it any other way yet), I connected a 1tb SSD stick by opening the case and inserting it in the m key slot in the Xavier development kit. I have connected it to the windows computer through J512 port (the port near the GPIO pins) with the Xavier being the device (gadget).

I will also test for the issue in 4.6.1 and tell you my result.

thank you

Hi
1TB SSD looks to be high-end device and may drain more current than developer kit can offer in certain condition. Would suggest try smaller size like 128GB SSD

this has also happened to me in both:

  1. the 32gb internal storage supplied with the device
  2. a mounted in memory filesystem (ramfs)

with and without the SSD attached.

I think the problem is with the bus and not the storage, but I’m not sure.

my config is:

enable_rndis=0
enable_acm=0
enable_ecm=0
enable_ums=1

fs_img=“dev/nvme0n1p1”