PCIe - USB Hub issue

Hi there,

I’ve bought a PCIe-USB hub (https://aliexpress.com/item/1005004338237594.html) to connect more cameras to my Orin AGX, but I’m not being able to reach a reasonable bandwidth (wanted something near 100MB/s) to capture approx.12fps - video using basler software attached.


All appears to be running fine. The command lsusb -t, returns:

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
|__ Port 1: Dev 2, If 1, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 2, If 2, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 2, If 0, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 10000M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=tegra-xusb/4p, 10000M
|__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=tegra-xusb/4p, 480M
|__ Port 3: Dev 2, If 0, Class=Wireless, Driver=rtk_btusb, 12M
|__ Port 3: Dev 2, If 1, Class=Wireless, Driver=rtk_btusb, 12M
|__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M

where bus 4 is the new hub.

The command dmesg -wH returns the following when I try to run the camera:
[jan10 09:24] xhci_hcd 0005:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ +0,010579] xhci_hcd 0005:01:00.0: Looking for event-dma 00000000fffc0000 trb-start 00000000fffbffb0 trb-end 00000000fffbffe0 seg-start 00000000fffbf000 seg-end 00000000fffbfff0

Full log attached below:
full dmesg -wH.txt (77.1 KB)

1 Like

I can’t answer, but I want to point out some things from the “lsusb -t”…

For:

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
|__ Port 1: Dev 2, If 1, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 2, If 2, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 2, If 0, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 10000M

In the above, the Bus 04, Port 1 root_hub has a total theoretical throughput of 10 Gbit/s (10000M). Two cameras running at 5000M would consume all of that bandwidth. There is no possibility of the third camera also running at 5000M. If they were sending data at different times, and each was only bursting 5000M, one at a time, such that no total traffic ever exceeded 10000M, then in theory it might work. However, cameras are always sending. At most that port should have two cameras at the 5000M speed.

The 10000M hub listed on port 2 of that same Bus 04, Port 1, also would consume bandwidth from the Bus 04, Port 1 root_hub. That hub is unusable in this circumstance since it shares with all of the three above listed 5000M devices. That port is only useful if all devices combined leave enough bandwidth, and that bandwidth is far over what is supported.

This port/hub is set up correctly, but has no way to support anything more:

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=tegra-xusb/4p, 10000M
|__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M

There is a kernel stack dump in the log related to gpumgrGetSomeGpu. This may or may not be related. Perhaps the excess bandwidth is causing other issues. It is also possible that this is a separate unrelated issue. You’d probably have to try running with the first 10000M HUB using only 2 of the 5000M cameras (then in theory you’ve maxed bandwidth), then see again if you get that kernel stack dump.

Hi @linuxdev, thanks for the reply!

One thing I find weird, and maybe that’s wheres the problem, is that, for the cases above, I had a single camera connected to the system (through the hub, picture attached), so bandwidth shouldn’t be a problem right?

If you see on the video the bandwidth keeps constrained below 10MB/s and has some short peaks going up to >400MB/s, the error below shows up as well on the basler software.

Any further ideas?

A HUB, if anything, is harmful to bandwidth. A HUB can be good for signal quality though. Imagine an extension cord to a power outlet on the wall, and the wall has a rated maximum current of 20 A. The extension cord is rated at 30 A. It doesn’t mean you can consume 20 A per socket of the extension, nor does it mean you can ever use a 30 A device…the sum total of all devices would have to remain below 20 A. Bandwidth on a HUB is similar, except there is overhead and latency introduced, and so bandwidth suffers slightly on a HUB, with no benefit if individual devices exceed the total maximum bandwidth.

Note that your software screenshot has errors of “Internal buffer overflow”. It means you are receiving more data than the hardware can handle. This verifies that you have too many cameras (at least at that resolution and frame rate…you’d have to reduce resolution and frame rate drastically for your system to handle that much data). There really isn’t any fix for this other than to add a root HUB. The tree topology list from before says your add-on card (I think that is the “Bus 04, Port 1” HUB) has only a single root HUB. The HUB you are using is being overrun by at least 50% more traffic than it can handle.

You might be able to find a PCIe based HUB which has independent root HUBs (it’d be more expensive, and many advertisers won’t give you enough information to know if this is one HUB and many ports, or one port per HUB and multiple root HUBs).

Incidentally, whenever you have a root HUB it will trigger a hardware interrupt when it needs to be serviced. If the software running on the CPU gets too much traffic, then you’ll also start dropping data even if you have a lot of root HUBs. The CPU itself has only so much bandwidth available. However, if you have PCIe lanes available to service that data, then it tends to work; PCIe lanes are something similar to root HUBs…as long as each lane is not required to run too much data through it the CPU should be able to keep up (but due to timing and other issues sometimes it cannot).

The thing I still don’t understand is that I’m running trials with only one camera (Dev 7 below)… therefore there’s nothing else that should be consuming much bandwidth.

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
…|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 10000M
… …|__ Port 3: Dev 7, If 0, Class=Miscellaneous Device, Driver=, 5000M
… …|__ Port 3: Dev 7, If 1, Class=Miscellaneous Device, Driver=, 5000M
… …|__ Port 3: Dev 7, If 2, Class=Miscellaneous Device, Driver=, 5000M

The tree above is when the single camera is connected to the HUB and I have the problem on the video attached above. When connecting it directly to the native USB port I can stream the camera image at full resolution at 48fps (aprox. 420MB/s)…

This sounds correct, but it also depends on the mode. It is possible for the driver to have other issues when looking at what is connected without actually waiting for data transfer, but it seems unlikely unless they use isochronous mode (in which case a failure is pretty much guaranteed). On the cameras you have attached, are they all the same model (it sounds like they are)? If you run just “lsusb” you’ll see a “Bus” and “Dev”, which combined is the “slot”. You can use that to limit a verbose query to just that device. As an example, assuming it is “Bus 001” and “Device 002”, this would be a fully verbose query which generates a log (adjust for the actual Bus and Device):

sudo lsusb -s 001:002 -vvv 2>&1 | tee log_lsusb.txt

Can you post the fully verbose lsusb of the camera you are testing (assumes they are the same model)?

Incidentally, try the same test with the other cameras disconnected just to see what difference is seen with only the one camera.

So, I have the following setup (two identical Basler cameras daA3840-45uc):

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
…|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 10000M
… …|__ Port 3: Dev 8, If 0, Class=Miscellaneous Device, Driver=, 5000M
… …|__ Port 3: Dev 8, If 1, Class=Miscellaneous Device, Driver=, 5000M
… …|__ Port 3: Dev 8, If 2, Class=Miscellaneous Device, Driver=, 5000M

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=tegra-xusb/4p, 10000M
…|__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M
… …|__ Port 2: Dev 4, If 0, Class=Miscellaneous Device, Driver=, 5000M
… …|__ Port 2: Dev 4, If 1, Class=Miscellaneous Device, Driver=, 5000M
… …|__ Port 2: Dev 4, If 2, Class=Miscellaneous Device, Driver=, 5000M

On Bus 04 I have the PCI Hub and on Bus 02 the direct USB connection - logs attached. For what I’ve seen they are exactly the same.

Bus002Dev004.txt (4.9 KB)
Bus004Dev008.txt (4.9 KB)

Neither is isochronous mode, so it is possible for bulk transfer to work so long as total traffic does not exceed the root HUB throughput. They’re all operating at USB3 (USB 3.1 gen. 1). I wish I knew if cameras not being used were actually not sending traffic (sending but ignoring data is different than not sending it…I’m just speculating on the possibility that having the cameras all plugged in is a problem even if you are not displaying anything from them).

Have you tried maxing out clocks? Also, even though you are only using one camera, is there any frame rate difference if the other cameras are not plugged in?

The thing is that the problem happens even if all of the other cameras are disconnected from the Jetson (single camera connected to the PCIe Hub). I’ve went through the tutorial below, but no improvements so far…
NVIDIA Jetson Orin - Maximizing Performance

Someone from NVIDIA will need to comment, but since single camera does have enough bandwidth, but buffers overrun, then it is probably a case of making sure the CPU is maxed out in frequency; beyond this, then there might be an issue with drivers or the program consuming the data. If you’ve maximized performance (and probably you have since you linked the article), it might mean you need to research if the same issue occurs when dumping all of the camera data into some NULL sink (something not requiring processing, but which basically tests the data path to see if there are overruns; a sync error would differ from overruns).

@dusty_nv any ideas here? Thanks in advance!