Dear nvidia team,
We are using Orin Developer kit and encountered a problem when working with USB serial device. Serial connection runs fine for a while, and then gets stuck somewhere in a Linux kernel.
Here are the key points that I’m identified:
- Device works consistently fine on x64 host. Problem is reproducible on Orin only.
- Device works fine on Orin for some time. It is very likely enter the stuck state, however, sooner or later.
- Device runs parallel with USB3Vision camera. When the issue manifests itself, camera is also impacted (FPS is dropped temporarily).
- Device is plugged directly into a dedicated USB port and does not have any hubs in between. Camera is also plugged into it’s dedicated USB port. Device does not have it’s own power supply and is power-cycled on every unplugging.
- If you unplug a device and plug it back, Orin will be unable to even enumerate it. Enumeration issue persists until Orin is rebooted.
- If you plug a logic analyzer into USB bus after the enumeration issue, you won’t see any traffic. Only regular SOF’s will be present on bus. This persists even after device unplug - no USB reset and no SETUP packets are sent. Meantime, Orin will complain about descriptor read error, which isn’t surprising because descriptors were never requested.
- If you plug a logic analyzer into USB bus before device reconnection (so enumeration issue won’t occur), you won’t see any trafic either. Orin will poll the device with IN packets, but will not sent any OUT packet even when explicitly requested by
echo 123 > /dev/ttyACMx
. - Echo process will get stuck in a syscall:
[<0>] __switch_to+0xc8/0x120 [<0>] usb_start_wait_urb+0x94/0x100 [<0>] usb_control_msg+0xc4/0x140 [<0>] 0xffffac2b0b82dfac [<0>] 0xffffac2b0b82ec24 [<0>] tty_port_block_til_ready+0x1e0/0x320 [<0>] tty_port_open+0xcc/0x110 [<0>] 0xffffac2b0b82d924 [<0>] tty_open+0x130/0x530 [<0>] chrdev_open+0xac/0x1b0 [<0>] do_dentry_open+0x134/0x3a0 [<0>] vfs_open+0x3c/0x50 [<0>] path_openat+0x858/0xde0 [<0>] do_filp_open+0x88/0x110 [<0>] do_sys_openat2+0x1fc/0x2b0 [<0>] do_sys_open+0x80/0xd0 [<0>] __arm64_sys_openat+0x30/0x40 [<0>] el0_svc_common.constprop.0+0x80/0x1d0 [<0>] do_el0_svc+0x38/0xb0 [<0>] el0_svc+0x1c/0x30 [<0>] el0_sync_handler+0xa8/0xb0 [<0>] el0_sync+0x16c/0x180
- No specific
dmesg
messages are produced when issue manifests itself. All dmesg messages are post-mortem - i.e. complains about descriptor read failure. If you don’t do any interactions, device could remain in a stuck state for indefinite time without any messages reported. - USB port effectively becomes dead - other devices are not enumerated as well.
I understand that you will not be able to reproduce this issue by yourself, because issue is probably specific to a particular setup. However, I can collect more detailed dumps and debug info if you will need them.