Concurrent USB 3.0 Port Access

Hi,

I need to connect three cameras to three USB 3.0 ports. I found this card ( VANTEC Quad Chip 4-Port Dedicated 5Gbps USB 3.0 PCIe Host Card Model UGT-PCE430-4C - Newegg.com ) which includes four independent USB 3.0 controllers. The card works well if I run one camera at a time with the maximum throughput. However, if I run two cameras simultaneously with two threads, the throughput will reduce to half for each camera. I also tried to connect one camera to the original USB 3.0 port on Jetson TX2, but got the same result. So, does Jetson TX2 support independent USB ports, or it accesses all USB ports via a hub?

Thank.

If it is using four lanes, then it should be ok even if it is running at gen. 1 speeds. With the cameras plugged in, what do you see for:

lsusb -t

If you run “lspci”, then you’ll see a line for the actual HUB, and that line will start with a slot. For example, it might look something like “00:12.0”. For “lsusb” to specify just that slot, and in the verbose mode, you would enter this command (but adjust for the slot with the USB card):

sudo lspci -s 00:12.0 -vvv

Post the result of both of those.

The Jetson itself isn’t what supports or fails to support independent ports. That’s up to the controller and the card connecting to the controller. On the other hand, there are ports which are partly a function of the carrier board. You can be guaranteed though that the dev board’s USB ports are independent of the PCIe card’s ports…neither will have any effect on the other so far as picking speeds and sharing bandwidth.

Don’t forget to maximize performance while testing:

sudo nvpmodel -m 0
sudo /home/ubuntu/jetson_clocks.sh

The response of “lsusb -t”:

/: Bus 10.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 5000M
/: Bus 09.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
/: Bus 08.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 5000M
|__ Port 1: Dev 4, If 0, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 4, If 1, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 4, If 2, Class=Miscellaneous Device, Driver=, 5000M
/: Bus 07.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
/: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 5000M
|__ Port 1: Dev 2, If 0, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 2, If 1, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 2, If 2, Class=Miscellaneous Device, Driver=, 5000M
/: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 5000M
|__ Port 1: Dev 4, If 0, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 4, If 1, Class=Miscellaneous Device, Driver=, 5000M
|__ Port 1: Dev 4, If 2, Class=Miscellaneous Device, Driver=, 5000M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/3p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/4p, 480M
|__ Port 1: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 3: Dev 10, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 4: Dev 9, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M

The result of running “sudo lspci -s 03:00.0 -vvv”:

03:00.0 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller (rev 02) (prog-if 30 [XHCI])
Subsystem: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 128 bytes
Interrupt: pin A routed to IRQ 388
Region 0: Memory at 50100000 (64-bit, non-prefetchable)
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
Vector table: BAR=0 offset=00001000
PBA: BAR=0 offset=00001080
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number 13-00-00-00-92-43-14-08
Capabilities: [150 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: xhci_hcd

Thanks.

The ports from that card are correctly operating at USB3 speeds. There is of course always more to the story than just the port itself, but this basic detail is good to go.

This part has me wondering since it appears it is running on only a single lane:

LnkSta:	Speed 5GT/s, Width x1,...

The “LnkSta” is actual status. The “LnkCap” is the capability. According to this the card is only capable of single lane, but is operating well at gen. 2 speeds on that single lane. The URL you gave for model “UGT-PCE430-4C” claims it is x4. In that case I would expect “LnkCap” to be x4, not x1. Keep in mind though that it is accurate to say an x1 card is compatible with x1, x2, x4, and x16 slots…but it doesn’t mean x1 can actually use more than one lane. It seems this is what is happening with this card.

I see this on the NewEgg site about this model (you should check to be certain this is the model you actually got):

Support 5Gbps per chip, with 4 chip to get a total bandwidth of up to 20Gbps

The implication of this statement is that it has four independent controllers (and evidence is that it does have four independent controllers), and that the card uses 4 PCIe lanes. x1 is definitely not x4.

Going back to the model number, PCIe itself says this is “uPD720202”. The model I see (“UGT-PCE430-4C”) might actually be the same because PCIe reports the chipset…the chipset is not the same as the model. There may be a number of PCIe USB cards which are different models from different manufacturers, yet use the same chipset. The part which is undeniable is that this card is only using a single lane, and is not x4 lane count…even if it is x4 card slot. One lane at gen. 2 has a practical throughput of “5GT/s * 8b/10b”, and for a single lane, this is actual throughput after encoding of “4Gb/s”. This card cannot support multiple USB3 devices running at full speed despite having four controllers with each at 5Gb/s…it is correct that the card has the four controllers, and that each controller runs at 5Gb/s…it is incorrect to say this is an x4 card using four lanes. With x1 lane count this card shares a total bandwidth of a single controller, 5Gb/s combined total.

One suggestion I have is to plug this into a desktop PC and see what the verbose lspci says on the other system as well. See if they agree. Generally speaking though, the “LnkCap” is accurate and won’t back off on its capabilities just because the actual status is running at some lower mode.

Here is a URL I found to compare PCIe slots of various sizes. Does your card actually have a physical slot size of x4?
https://img.hexus.net/v2/dvdoctor/technobackground/pciexpress/PCIExpressSlot_C.jpg

I tried the card on PC (Windows 7). The related information is:

Renesas Electronics USB 3.0 Host Controller
Renesas Electronics USB 3.0 Host Controller
Renesas Electronics USB 3.0 Host Controller
Renesas Electronics USB 3.0 Host Controller
Renesas Electronics USB 3.0 Root Hub
Renesas Electronics USB 3.0 Root Hub
Renesas Electronics USB 3.0 Root Hub
Renesas Electronics USB 3.0 Root Hub

I tested my Windows video streaming program with three cameras and got the maximum throughput (42 FPS) from each camera. That means the card works with Windows.

On the other hand, I got some progress on Linux after typing in your suggested commands:

sudo nvpmodel -m 0
sudo /home/ubuntu/jetson_clocks.sh

Now I got 42 FPS (the maximum) for each camera if I run two cameras. If I run three cameras at the same time, I got 42 FPS (the maximum) from one camera and about 30 FPS from each of the other two cameras. Any suggestions? Thanks.

I find it difficult to extract actual details of PCIe from Windows (Windows hides details). You know the controllers are there, and the TX2 told you this as well. What the TX2 has told you, which Windows won’t give details on, is how many PCIe lanes the device reports as “capability”, and how many PCIe lanes are actually used in current “status”. A bootable Linux DVD or thumb drive which supports the “lspci” command would tell you this.

If it turns out that the card reports differently under Linux with a PC (e.g., it says under a PC that capability is four lanes), then we have a place to start. If it turns out that Linux on a PC also agrees that the card has only one lane, then this too gives needed information. Is there any way you can try this on a Linux PC (the lspci part)?

Setting max performance would partially alleviate the problem under most any circumstance, but it doesn’t get at the root cause. For example, if four PCIe lanes were actually used, then Windows might also get 4x the performance/framerate (well, since you have three cameras, 3x). It isn’t in doubt that the card works…what is in doubt is if it is performing with PCIe x4 lane throughput. PCIe is designed to function equally well…but slower…if fewer lanes are available.

It will take me a while to prepare a Linux PC. I will let you know if I have a result of that.

By the way, the card manufacturer provides the device driver for Windows, but not for Linux. It uses an existing/general PCIe device driver on Linux. Does that cause the performance issue?

So long as the install has “lspci” you can use a live DVD/thumb drive to see if it is x1 or x4. On the other hand, you will probably need a real install for any kind of flash or package addition (VMs tend to fail).

Consider that image I added the URL to for comparing x1 connectors, x4 connectors, and so on. You should look at the card and very closely examine the edge connector and decide if the connector is actually an x1 or something larger. If it is x1, then this guarantees the card you ordered is not the one you received. If it is physically x4, and yet it was wired only for x1, then either there is an electrical failure (defect), or once again the card is not as shown in the specs which claimed 20Gb/s throughput (requiring x4 at gen. 2 speeds). All we’ve seen so far is that this is purely an x1 card running at gen. 2 speeds…it seems to be missing three lanes (but the single visible lane works well).

FYI, a PCIe device driver is independent of any particular end device’s function on the PCIe bus. This driver is always shipped with Windows, and always shipped with Linux. The driver does nothing but talk to PCIe devices without knowing anything about what the device does. Every valid PCIe card will be visible without additional drivers since this is a PCIe driver function. No PCIe card will function without a driver specific to itself which is in addition to PCIe drivers (the pipe and the end device functions are separate drivers). The “capability” of x1 or x4, as shown in lspci, will not depend on the device driver of the end device…PCIe query is at a lower level than the end device driver.

FYI, the end device driver seems to be fully functional. You have four independent USB3 5Gb/s controllers. What you don’t have is a big enough PCIe pipe to handle four controllers simultaneously. The single PCIe lane won’t handle more than a single controller and after that competing traffic will starve data.