Jetson ORIN AGX USB3 Hub randomly becoming USB2.0

I’m having a weird issue when using 3 USB cameras with the Jetson Orin. 1 Is plugged directly into the USB3 port, the other 2 are plugged in via a USB hub into the second USB3 port.

This configuration works well 95% of the time with no issues. But randomly 1 or more cameras throw an error saying The device cannot be operated on an USB 2.0 port. The device requires an USB 3.0 compatible port. Obviously this error always appears if the actually plugged into a USB2 port, but we makes sure to use the USB3 ports.

I’ve tried resetting the camera using echo '<port>-<device>' | sudo tee /sys/bus/usb/drivers/usb/unbind but this did not work. I also noticed when looking in /sys/bus/usb/devices/<DEVICE>/power that the one that raised the error has USB2 in the folder, while the others have USB3 (shown in the images below)

Moreover when running lsusb the camera that raises the USB2 error is on the same bus as the Linux Foundation 2.0 root hub bus (look at the devices that have Basler in the name)
image

Whereas on an orin that works all 3 cameras are in the same bus as Linux Foundation 3.0 root hub
image

The cameras are in an autonomous vehicle so vibrations of the robot could definitely be playing a part in causing this error, and I’ve not been able to reproduce the issue on a static test bench.

These issues are resolved when the board is rebooted, but is there a way to reset the USB hubs without rebooting the board? Or a way to prevent this happening in the first place?

Equipment

Carrier Board: Carrier Board supports NVIDIA® Jetson AGX Orin™ Module - AVerMedia | AVerMedia
Cameras: daA1920-160uc (S-Mount) | Basler AG
USB Hub: https://www.amazon.co.uk/Waveshare-Industrial-Switchable-Protections-Compatible/dp/B0BRN9QNY2

Hi,

Could you report this issue to Avermedia?

Otherwise you may need to reproduce this issue on NV devkit so that we can help you.
Your carrier board is a custom board to us which we don’t know it well.

Yes I can do that as well.

In the meantime, is there anything you could recommend trying? The custom board has the same USB configuration as the dev kit where the 4 type-A ports are connected to single USB roothub. Can you force the hub to use USB3.0?

Hi,

USB is not working like that. There is a LTSSM for usb to decide whether it goes to USB2 or USB3. There is no “I force you to must run in USB3”.

There are several issues here that needs to clarify first.

  1. It is kind of common mistake to say “this board has same configuration as NV devkit”. I have been heard such comments from lots of users in the past few years but really not many of them have “same configuration” as NV devkit. Even one GPIO change means it is not same as NV devkit.
    It is better directly using NV devkit to reproduce this.

  2. The usb hub setting is actually not part of NVIDIA driver thing.
    If you need a reset, there is a old tool here which runs in standard usb control protocol to power off/on the hub. You can try it.
    USB Power control - #7 by DaneLLL

  3. I am not sure if that matters. You added one more hub there. Would you still reproduce issue if you put all 3 usb cameras directly on the on-board hub?

Hey,

  1. That is fair enough, I’ll see if I can find the differences. Reproducing this with the devkit is difficult as it’s a not a super common issue, and I’ve never reproduced it on our test bench, it only seems to happen in a moving vehicle and I’m unfortunately not in a position to install a dev kit on one of the robots right now.

  2. Thanks I’ll take a look

  3. I can’t plug all 3 cameras directly into the board because there’s only 2 USB3 ports, the other 2 are USB2 and we don’t have the USBC port on this carrier board.

I’m aware it could be something to do with cables connecting/disconnecting with the movement of the robot and I’m trying to understand how/why this issue might be occuring.

This is just some commentary on USB that might be useful to know.

If you run the command for lsusb as a tree view:
lsusb -t
…then you will notice the tree structure of USB.

There is always a root HUB. The root HUB is the node that determines maximum capability. No branch (a non-root HUB), nor any leaf (the actual device), can exceed this. The maximum performance of the root HUB will be seen at the end of the line in terms of “Mb/s”, abbreviated like this:

  • 12M: USB 1.1
  • 480M: USB 2.0
  • 5000M: USB 3.0, or USB 3.1 (USB 3.0 was absorbed into later USB 3.1, and was considered generation 1).
  • 10000M: USB 3.1 gen. 2 or USB 3.2.

I am going to guess that when the issue occurs that you can run a tree view (lsusb -t), and note that the root HUB has not changed (you might want to copy down the tree view ahead of time for when it works as a comparison). Either a non-root HUB or a leaf node (the actual device) would have changed.

FYI, some devices have multiple devices on one cable, e.g., the audio of a camera with audio is separate; stereo cameras are also separate; sometimes the same device has more than one API access as well despite being the same device.

Noting which device point has changed rating is important.

In the case of a port capable of USB 3+, this routes to its own root HUB. That root HUB is only backwards compatible with older standards (like USB2) because of the carrier board device tree naming a companion legacy controller. If the port’s root HUB is only USB2 to start with, then the same controller chip has legacy support for USB 1.0 and USB 1.1, and so there is no need for the device tree to name other controllers. The point to consider is that there is a signal path on the carrier board for different wires which lead to either the USB 3 controller or the legacy controller; the actual Tegra SoC has the hardware, but the device tree tells the module and SoC which wiring options are used to achieve that. If the device works at some time in both modes, then I’d say that both wiring and device tree are correct.

However, once you get past the hardware side, consider that if the signal quality is not good enough, then the controller is designed to automatically regress to a slower standard if the device on the other end supports it (e.g., most cameras with USB3 also support USB2, but some won’t because of something USB2 is not able to support). It is quite possible for perfectly functional hardware to not meet the USB3 spec and fall back to USB2 when signal quality is not sufficient. RF is quite complicated, and square waves make this far more difficult. The USB 3+ is using much higher frequencies, and thus some of the restrictions on things like maximum cable length will be due to the effects of reflections and not due to resistance in wiring. Other impedance matching issues, on otherwise 100% valid hardware, will cause this reduction to USB2 (the other choice is the device drops out and no longer functions).

If any root HUB exists which is capable of USB3, even during failure, and even if not used for any specific device, then the USB3 software is likely working correctly and the issue is most likely a signal quality issue.

A regular HUB (non-root HUB) can serve as a kind of adapter or extender since the cable lengths needed will be split between signal to HUB and from signal of HUB to root HUB. This is more for reasons of RF reflections and less from reasons of signal loss due to resistance.

At this point we only know that the system has dropped back from higher speeds to lower speeds. It is unlikely that the root HUB is at fault, but a tree view showing that no root HUB of the required spec exists would show the root HUB as being at fault. That tree view to show where the failure is would be important, and if you have a HUB you can put in the middle (a HUB capable of your USB3 standard) changes this, then you’ve found a solution in the form of splitting the cable into smaller sections in order to achieve higher signal quality.

Incidentally, USB3 devices powered by the USB cable itself will both allow and require higher power delivery. Another reason for dropping back in speed would be for a device that fails to get proper power dropping out, and then reenumerating at a lower speed using lower power. Isolating power delivery to such a device is a good debug tool to eliminate power as a cause, but there are some devices which obviously would not consume too much power (such as a mouse or keyboard, but I don’t know of any that run at USB3 speeds).

If you can find where this device changes in the tree view, and you can post the tree view for both working and failing, we might be able to offer more detailed advice. My bet is that most of the advice is going to be related to signal quality.

Thanks for this, there’s a lot of useful information here. We too have suspected signal quality issues previously, and what you’ve mentioned here makes us think now this is the most likely issue.

Here is the output from lsusb -t from 3 working cameras
image

Here is the output from lsusb -t from 2 working cameras and 1 throwing the USB2 error
image

FYI, actual mouse copy and paste works better than images (one reason why serial console is nice to use, but it also works from ssh). Makes it easier to use diff or copy and paste relevant log lines. Just append " 2>&1 | tee log_name.txt", e.g.:
lsusb -t 2>&1 | tee log_lsusb.txt`

From this it shows that you have bus 2, port 1 as a USB 3.1 gen. 2 root port. This works at 10 Gb/s (10000M). This same root HUB controller provides USB 3.1 gen. 1 support, which will be items running at 5 Gb/s (5000M). There us a USB 3.1 gen. 1 HUB connected to this (not a “root” HUB), and thus anything on that non-root HUB will be limited to 5000M.

What I find interesting is that this same non-root HUB runs 6 devices in both the working and failing circumstances. It seems likely that this particular root HUB, servicing the gen. 1 HUB, is not part of the problem (but I do see an issue I’ll mention further down).

You happen to have Port 3, device 3 with three devices (but not a HUB) in the “working” case. Typically that means a device which has multiple devices internally, but on a single cable. Stereo cameras or a camera plus stereo audio is one example; programmable keyboards often do this as well, but they’re never USB 3. This is the failing device, which the error condition has migrated it to the USB 2 (480M) root HUB. Originally, that 480M USB 2 HUB has only two devices; it now has three since the USB 3.1 gen. 1 device migrated.

The migration does suggest that software and hardware are functioning correctly. The migration would have been due to signal quality. Keep in mind that any noise in power delivery is part of signal quality, but you have not confirmed the power source of this migrated device, e.g., if it is powered from the USB port there are differences compared to being powered from a separate supply. Most of the time though (it isn’t guaranteed) power noise issues would take out more than one device. Considering this device is on the USB 3.1 gen. 1 non-root HUB, if the noise had been in common with that HUB, then I would think that all devices on the HUB would be an issue. Power delivery noise to the gen. 1 HUB is unlikely. We are now hunting for a core signal quality issue. I will emphasize though that this assumes it is the same device which fails each time, and that the issue does not apply to the other USB 3.1 gen. 1 devices on that HUB.

Had there been an outright failure, then this device would have disappeared and not migrated. Some cameras which demand USB 3 will simply disappear as they do not self-describe as having a USB 2 capability. Your device has the capability, and it is doing as instructed.

USB 3 has a very fast square wave signal on it. Reflections and noise can easily cause the faster timings to fail (which results in dropping back to USB 2). Just some possibilities:

  • A longer cable can fail even without external noise just due to reflections in the cable. USB does not provide a separate clock, and it is mandatory to recover the clock from the core signal. Jitter is a bit more harmful in that case.
  • Outside noise on a cable can cause this issue even if cable length is not directly responsible for signal quality. Two devices can work great until the cable takes a different “shape” due to device placement. A device on the left side of a vehicle might pass by ignition circuitry, whereas the same device on the right side won’t pass near the noise source. If you really wanted to know, then you might need a breakout board that allows a vector signal analyzer on it to see reflections. The signal is a square wave, and so it is going to have reflections from odd harmonics and not just be the simple case of a sine wave. A high end USB 3 analyzer could do this as well, but it more oriented to telling you there is an error in signal, whereas the vector network spectrum analyzer will tell you more about details of the signal’s reflections (this can tell you about the issue as if the cabling is an antenna and transmission line).
  • Perhaps a shorter cable, or a cable routed differently, or a cable with better shielding would solve this.

The part I did not mention yet except in passing is the issue of the number of devices on the USB 3.1 gen. 1 HUB for the working case. The root HUB has 10 Gb/s capability, but the HUB you are using only handles 5 Gb/s. I am assuming that the non-root HUB is a USB 3.1 gen. 1 device, but if it is really a gen. 2 device (10000M), then this would be the first device with low signal quality. Is this really a USB 3.1 gen. 1 HUB? I’ll digress and explain some more information on debugging with lsusb.

You can increase the description of devices in “lsusb -t” by adding the “-v” option:
lsusb -t -v
(you can even use two verbose via -vv, but that adds only the /sys and /dev associations)

Note that lsusb, when not using tree mode can be very verbose, but you have to use sudo (root authority) for this to be allowed. Example:
sudo lsusb -vvv

That is a lot of output for each device. In the verbose lsusb -t -v (or just “lsusb”) there is an ID for each device. An example is “0955:7023”. For that example the 0955: is the manufacturer ID for NVIDIA; the 7023 of that example is for an Orin either in recovery mode or in device mode (they’re both devices for that case). If you know the ID of the device which fails (there are three devices, likely they all have the same ID on that same cable), then you can limit the fully verbose query to that device. I’ll pretend the ID is 0955:7023, but it isn’t going to be that in reality:
sudo lsusb -d 0955:7023 -vvv

To log this:
sudo lsusb -d -955:7023 -vvv 2>&1 | tee log_verbose.txt

What we will see is the full capability of the device with that ID. You could also do this with the non-root HUB I am assuming does not have gen. 2 capability, and not just the end device (don’t bother with the HUB unless it is supposed to be gen. 2; perhaps my assumption that it is gen. 1 is correct). This information is the self-description the device itself is providing. This will state not only what the capability is, but also what its current state is. Note: If the device is running on a USB 3 port or HUB, but downgrades to USB 2, then it should still report the USB 3 capability; if the device is running on a port or HUB not capable of USB 3, then it won’t report USB 3 capabilities.

I’m going to predict ahead of time that the fully verbose log indicates gen. 1 capability, but throttled to USB 2. If this is the case, then it more or less guarantees (“smoking gun” evidence) that a signal analyzer and USB analyzer would tell you signal quality is the issue.

A further issue I have not yet fully described is that your setup does not have enough bandwidth to succeed. The HUB being used, assuming it is really gen. 1 at 5000M, is servicing (in the working case) 6 devices, and each device can consume the full 5000M bandwidth. Whether this works depends on the nature of the devices (this won’t necessarily fail, but odds are not in favor of success).

To explain more about why I mention the “nature of the devices” I’m going to digress to something that seems like an unrelated example (but it is actually a good description). Back in the days of parallel ATA disk drive ports you could put two disks on a single cable, and this leads to a single controller. If you only accessed one disk at a time there was no performance drop. However, you wouldn’t be able to use both disks simultaneously. The SCSI standards were invented to get around this. What they did is to burst traffic, and then detach from the bus. Up to 15 devices could “seemingly” (and somewhat “actually”) function at the same time. The devices were slow compared to the bus, but a command was issued to the disk, then the disk detached while working, and when its work was done, it would reattach to the bus. This was hardware enforced sharing.

To continue that example, serial ATA skipped using two disks on one controller or one cable. Each disk has its own controller, and so the issue does not exist. The SCSI command set is used, but at the disk end, ATA is still ATA regardless of being serial or parallel. This latter was to unify the command set to something more advanced than the PATA.

Back to your actual case with several gen. 1 devices on a single gen. 1 HUB: Some devices are interrupt driven, or not continuously generating data. A simple example is that a keyboard or mouse won’t generate data unless they detect an event (they are “event driven”). I don’t know what the USB 3 devices are, but I’ll provide some contrived examples…

A high quality USB 3 camera which compresses its data might get away with bulk transfers in USB 3 bursts. On USB 2 even bursts would overlap because it would take too long to transfer data. A pair of such cameras might not be as smoothly operating as they should be, but you might not notice frame drops. Take away the compression, and then one camera would require the full USB 3 bandwidth so often that perhaps both cameras would suffer (at least one camera would have terrible performance). Or one device might be demoted to slower speed (the last to enumerate would be the one to lose the contest).

A stereo camera would require both cameras to operate correctly rather than losing frames, and so a stereo camera consuming 5000M on both left and right would likely outright fail even if that stereo camera were the only one on that root HUB. The verbose lsusb -vvv, when queried on a USB 3 port, would not self-describe as having a USB 2 capability. This would be an example of a device that completely drops out rather than reducing speed if signal quality is not sufficient. Your devices probably are not “stereo” cameras, but they might be cameras.

Cameras and disk drives both consume considerable bandwidth, but the nature of the buffering is quite different. Disks are always a bulk transfer mode from a sizeable buffer. Cameras tend to need more real time access. The best cameras will run in isochronous mode, which is a reserved bandwidth which is guaranteed to that device. This gives a very high quality lossless communications with guaranteed maximum delay times. Cameras often can operate in a bulk mode like a disk drive, but this is less desirable. When cameras do use this mode they will use a lot less buffering, and so the time of consuming the bus is less. This latter case probably drops frames sometimes, and won’t be deterministic. Isochronous mode can be deterministic. Isochronous mode devices won’t fall back to the lower standard most of the time, although they might.

Using a HUB with 5000M bandwidth implies you are only guaranteed to function correctly for a single device that runs 5000M. You have six such devices. They might function, depending on the device, but bandwidth alone could cause congestion and issues that would trigger demoting one of the devices from 5000M to 480M. I don’t think this is the cause of demoting, but when a bus is operating at its max, you can consider the risk of signal issues from things like noise or RF reflections to take a dramatic jump from minor issues.

I do see some other 5000M ports which are not being used. The carrier board being used, in combination with a device tree, can influence this. Maybe those ports are available elsewhere, and you could transfer the failing camera to a different port to give it its own 5000M of bandwidth. Perhaps one of the other devices uses more consistent bandwidth, in which case one of those would be a candidate to moving. I don’t know if your carrier board and device tree combination is actually wired to expose multiple 5000M root HUBs.

Changing your non-root HUB to one capable of gen. 2 (10000M) would instantly give you 5000M higher simultaneous bandwidth. Better yet, change this to a 10000M HUB and move one of the 5000M devices to a different root HUB.

Let’s say you do achieve the USB bandwidth. Keep in mind that some combination of CPU core and maybe GPU use will now receive data faster, and so those might run at a higher load. If you were to increase the bandwidth too much, then you could still lose frames on a camera (but in this case “frames” could be a generic term for a batch of data capable of being lost even if it isn’t a camera; I use a camera as the example because it is the biggest data consumer that needs a more or less real time behavior).

1 Like