FYI, actual mouse copy and paste works better than images (one reason why serial console is nice to use, but it also works from ssh). Makes it easier to use diff or copy and paste relevant log lines. Just append " 2>&1 | tee log_name.txt", e.g.:
lsusb -t 2>&1 | tee log_lsusb.txt`
From this it shows that you have bus 2, port 1 as a USB 3.1 gen. 2 root port. This works at 10 Gb/s (10000M). This same root HUB controller provides USB 3.1 gen. 1 support, which will be items running at 5 Gb/s (5000M). There us a USB 3.1 gen. 1 HUB connected to this (not a “root” HUB), and thus anything on that non-root HUB will be limited to 5000M.
What I find interesting is that this same non-root HUB runs 6 devices in both the working and failing circumstances. It seems likely that this particular root HUB, servicing the gen. 1 HUB, is not part of the problem (but I do see an issue I’ll mention further down).
You happen to have Port 3, device 3 with three devices (but not a HUB) in the “working” case. Typically that means a device which has multiple devices internally, but on a single cable. Stereo cameras or a camera plus stereo audio is one example; programmable keyboards often do this as well, but they’re never USB 3. This is the failing device, which the error condition has migrated it to the USB 2 (480M) root HUB. Originally, that 480M USB 2 HUB has only two devices; it now has three since the USB 3.1 gen. 1 device migrated.
The migration does suggest that software and hardware are functioning correctly. The migration would have been due to signal quality. Keep in mind that any noise in power delivery is part of signal quality, but you have not confirmed the power source of this migrated device, e.g., if it is powered from the USB port there are differences compared to being powered from a separate supply. Most of the time though (it isn’t guaranteed) power noise issues would take out more than one device. Considering this device is on the USB 3.1 gen. 1 non-root HUB, if the noise had been in common with that HUB, then I would think that all devices on the HUB would be an issue. Power delivery noise to the gen. 1 HUB is unlikely. We are now hunting for a core signal quality issue. I will emphasize though that this assumes it is the same device which fails each time, and that the issue does not apply to the other USB 3.1 gen. 1 devices on that HUB.
Had there been an outright failure, then this device would have disappeared and not migrated. Some cameras which demand USB 3 will simply disappear as they do not self-describe as having a USB 2 capability. Your device has the capability, and it is doing as instructed.
USB 3 has a very fast square wave signal on it. Reflections and noise can easily cause the faster timings to fail (which results in dropping back to USB 2). Just some possibilities:
- A longer cable can fail even without external noise just due to reflections in the cable. USB does not provide a separate clock, and it is mandatory to recover the clock from the core signal. Jitter is a bit more harmful in that case.
- Outside noise on a cable can cause this issue even if cable length is not directly responsible for signal quality. Two devices can work great until the cable takes a different “shape” due to device placement. A device on the left side of a vehicle might pass by ignition circuitry, whereas the same device on the right side won’t pass near the noise source. If you really wanted to know, then you might need a breakout board that allows a vector signal analyzer on it to see reflections. The signal is a square wave, and so it is going to have reflections from odd harmonics and not just be the simple case of a sine wave. A high end USB 3 analyzer could do this as well, but it more oriented to telling you there is an error in signal, whereas the vector network spectrum analyzer will tell you more about details of the signal’s reflections (this can tell you about the issue as if the cabling is an antenna and transmission line).
- Perhaps a shorter cable, or a cable routed differently, or a cable with better shielding would solve this.
The part I did not mention yet except in passing is the issue of the number of devices on the USB 3.1 gen. 1 HUB for the working case. The root HUB has 10 Gb/s capability, but the HUB you are using only handles 5 Gb/s. I am assuming that the non-root HUB is a USB 3.1 gen. 1 device, but if it is really a gen. 2 device (10000M), then this would be the first device with low signal quality. Is this really a USB 3.1 gen. 1 HUB? I’ll digress and explain some more information on debugging with lsusb.
You can increase the description of devices in “lsusb -t” by adding the “-v” option:
lsusb -t -v
(you can even use two verbose via -vv, but that adds only the /sys and /dev associations)
Note that lsusb, when not using tree mode can be very verbose, but you have to use sudo (root authority) for this to be allowed. Example:
sudo lsusb -vvv
That is a lot of output for each device. In the verbose lsusb -t -v (or just “lsusb”) there is an ID for each device. An example is “0955:7023”. For that example the 0955: is the manufacturer ID for NVIDIA; the 7023 of that example is for an Orin either in recovery mode or in device mode (they’re both devices for that case). If you know the ID of the device which fails (there are three devices, likely they all have the same ID on that same cable), then you can limit the fully verbose query to that device. I’ll pretend the ID is 0955:7023, but it isn’t going to be that in reality:
sudo lsusb -d 0955:7023 -vvv
To log this:
sudo lsusb -d -955:7023 -vvv 2>&1 | tee log_verbose.txt
What we will see is the full capability of the device with that ID. You could also do this with the non-root HUB I am assuming does not have gen. 2 capability, and not just the end device (don’t bother with the HUB unless it is supposed to be gen. 2; perhaps my assumption that it is gen. 1 is correct). This information is the self-description the device itself is providing. This will state not only what the capability is, but also what its current state is. Note: If the device is running on a USB 3 port or HUB, but downgrades to USB 2, then it should still report the USB 3 capability; if the device is running on a port or HUB not capable of USB 3, then it won’t report USB 3 capabilities.
I’m going to predict ahead of time that the fully verbose log indicates gen. 1 capability, but throttled to USB 2. If this is the case, then it more or less guarantees (“smoking gun” evidence) that a signal analyzer and USB analyzer would tell you signal quality is the issue.
A further issue I have not yet fully described is that your setup does not have enough bandwidth to succeed. The HUB being used, assuming it is really gen. 1 at 5000M, is servicing (in the working case) 6 devices, and each device can consume the full 5000M bandwidth. Whether this works depends on the nature of the devices (this won’t necessarily fail, but odds are not in favor of success).
To explain more about why I mention the “nature of the devices” I’m going to digress to something that seems like an unrelated example (but it is actually a good description). Back in the days of parallel ATA disk drive ports you could put two disks on a single cable, and this leads to a single controller. If you only accessed one disk at a time there was no performance drop. However, you wouldn’t be able to use both disks simultaneously. The SCSI standards were invented to get around this. What they did is to burst traffic, and then detach from the bus. Up to 15 devices could “seemingly” (and somewhat “actually”) function at the same time. The devices were slow compared to the bus, but a command was issued to the disk, then the disk detached while working, and when its work was done, it would reattach to the bus. This was hardware enforced sharing.
To continue that example, serial ATA skipped using two disks on one controller or one cable. Each disk has its own controller, and so the issue does not exist. The SCSI command set is used, but at the disk end, ATA is still ATA regardless of being serial or parallel. This latter was to unify the command set to something more advanced than the PATA.
Back to your actual case with several gen. 1 devices on a single gen. 1 HUB: Some devices are interrupt driven, or not continuously generating data. A simple example is that a keyboard or mouse won’t generate data unless they detect an event (they are “event driven”). I don’t know what the USB 3 devices are, but I’ll provide some contrived examples…
A high quality USB 3 camera which compresses its data might get away with bulk transfers in USB 3 bursts. On USB 2 even bursts would overlap because it would take too long to transfer data. A pair of such cameras might not be as smoothly operating as they should be, but you might not notice frame drops. Take away the compression, and then one camera would require the full USB 3 bandwidth so often that perhaps both cameras would suffer (at least one camera would have terrible performance). Or one device might be demoted to slower speed (the last to enumerate would be the one to lose the contest).
A stereo camera would require both cameras to operate correctly rather than losing frames, and so a stereo camera consuming 5000M on both left and right would likely outright fail even if that stereo camera were the only one on that root HUB. The verbose lsusb -vvv, when queried on a USB 3 port, would not self-describe as having a USB 2 capability. This would be an example of a device that completely drops out rather than reducing speed if signal quality is not sufficient. Your devices probably are not “stereo” cameras, but they might be cameras.
Cameras and disk drives both consume considerable bandwidth, but the nature of the buffering is quite different. Disks are always a bulk transfer mode from a sizeable buffer. Cameras tend to need more real time access. The best cameras will run in isochronous mode, which is a reserved bandwidth which is guaranteed to that device. This gives a very high quality lossless communications with guaranteed maximum delay times. Cameras often can operate in a bulk mode like a disk drive, but this is less desirable. When cameras do use this mode they will use a lot less buffering, and so the time of consuming the bus is less. This latter case probably drops frames sometimes, and won’t be deterministic. Isochronous mode can be deterministic. Isochronous mode devices won’t fall back to the lower standard most of the time, although they might.
Using a HUB with 5000M bandwidth implies you are only guaranteed to function correctly for a single device that runs 5000M. You have six such devices. They might function, depending on the device, but bandwidth alone could cause congestion and issues that would trigger demoting one of the devices from 5000M to 480M. I don’t think this is the cause of demoting, but when a bus is operating at its max, you can consider the risk of signal issues from things like noise or RF reflections to take a dramatic jump from minor issues.
I do see some other 5000M ports which are not being used. The carrier board being used, in combination with a device tree, can influence this. Maybe those ports are available elsewhere, and you could transfer the failing camera to a different port to give it its own 5000M of bandwidth. Perhaps one of the other devices uses more consistent bandwidth, in which case one of those would be a candidate to moving. I don’t know if your carrier board and device tree combination is actually wired to expose multiple 5000M root HUBs.
Changing your non-root HUB to one capable of gen. 2 (10000M) would instantly give you 5000M higher simultaneous bandwidth. Better yet, change this to a 10000M HUB and move one of the 5000M devices to a different root HUB.
Let’s say you do achieve the USB bandwidth. Keep in mind that some combination of CPU core and maybe GPU use will now receive data faster, and so those might run at a higher load. If you were to increase the bandwidth too much, then you could still lose frames on a camera (but in this case “frames” could be a generic term for a batch of data capable of being lost even if it isn’t a camera; I use a camera as the example because it is the biggest data consumer that needs a more or less real time behavior).