USB3 xhci problem

Hi,
I’m experiencing a problem with USB3 on my Jetson TX1, in particular with an mPCIe to USB3 adapter.

I installed an mPCIe to USB3 adapter, based on Renesas Technology Corp. uPD720202, and what I have discovered is:

  1. if I boot the system without any USB device attached to the adapter, after the system boots up I can plug in any device and it discovers correctly attached devices.
  2. if I boot the system with a USB device attached to the adapter, USB ports of the adapter can not discover any device (even disconnecting and reconnecting them).

Additional info:

  • in both cases, the lspci command shows the adapter, and the lsusb command shows the additionals Linux Foundation 3.0 and 2.0 root hub (even if it is discovered with Vendor/Device ID as the embedded USB3 controller);
  • only ports of the adapter are relevant for the test, no matter if any other device is connected to other system ports;

During my test I discovered these section in the syslog during boot in the not-working configuration (2):

[ 9.644066] xhci_hcd 0000:01:00.0: Timeout while waiting for address device command
[ 19.863570] xhci_hcd 0000:01:00.0: Stopped the command ring failed, maybe the host is dead
[ 19.873950] xhci_hcd 0000:01:00.0: Abort command ring failed
[ 19.881840] xhci_hcd 0000:01:00.0: HC died; cleaning up
[ 25.094248] xhci_hcd 0000:01:00.0: Timeout while waiting for address device command
[ 25.094260] xhci_hcd 0000:01:00.0: Abort the command ring, but the xHCI is dead.
[ 30.304122] xhci_hcd 0000:01:00.0: Timeout while waiting for a slot
[ 30.304135] xhci_hcd 0000:01:00.0: Abort the command ring, but the xHCI is dead.
[ 30.304164] xHCI xhci_free_dev called with unaddressed device
[ 35.314084] xhci_hcd 0000:01:00.0: Timeout while waiting for a slot
[ 35.314093] xhci_hcd 0000:01:00.0: Abort the command ring, but the xHCI is dead.
[ 35.314116] xHCI xhci_free_dev called with unaddressed device
[ 40.314093] xhci_hcd 0000:01:00.0: Timeout while waiting for a slot
[ 40.314105] xhci_hcd 0000:01:00.0: Abort the command ring, but the xHCI is dead.
[ 40.314128] xHCI xhci_free_dev called with unaddressed device

I’m using L4T 24.2.1 (Linux tegra-ubuntu-elroy 3.10.96-tegra #1 SMP PREEMPT Wed Sep 28 17:51:08 PDT 2016 aarch64 aarch64 aarch64 GNU/Linux).

Thank you in advance for helping me.

I am uncertain which connector you are referring to for “mPCIe” since the JTX1 developer board does not have a mini-PCIe slot…is this on a third-party carrier board?

If the relevant USB goes through PCIe, then there may be some differences in hot plug versus going directly to the Tegra XHCI. Order and results of enumeration may be an issue, as well as the nature of the USB devices…are these just keyboard/mouse type devices?

Hi linuxdev,
thank you for the reply.

You’re right, I didn’t mention it, I’m evaluating a third party carrier board http://www.connecttech.com/sub/Products/ASG002.asp?l1=GPU&l2=ASG002 which has two mini-PCIe slot (in particular for mounting reason I’m using the one which function can be switched to mSATA, but I tried both ports with the same result).

About the devices, I initially addressed the problem with a camera, but then I did all tests with mouse and keyboard.

Can you suggest me some test in order to debug the problem?

Thank you

I can’t give you a specific thing to test, so here’s a big long description of how to look at the issue (sorry, it’s kind of an “experiment with it a while and see if something shows up” situation). One thing I am going to ask is does this mini-PCIe USB card itself have a requirements listing, such as drivers? If the right drivers aren’t there then this whole explanation is just kind of silly :P You mentioned a chip set and model, but I have not researched to see what its requirements are under Linux.

You mentioned the mPCIe slot itself lists as populated via lspci. Some PCIe cards have more advanced error reporting than others…knowing the PCIe sees a device is good, now try a verbose listing to see if there is any advanced error reporting (AER is how it would be abbreviated)…use sudo and the “-vvv” very verbose option. You can run without verbose at first, get the slot number ID (it’ll look something like “00:00.0” for the slot…yours will vary but I’ll use that as an example), then run verbose restricted to just that slot:

lspci
# find the slot...
sudo lspci -s 00:00.0 | tee the_pcie_log.txt

You won’t find AER in less expensive cards, and sometimes not at all on mPCIe to save space, but see what it says. That’s the PCIe side…after that it is all USB. Assuming you use a few attempts to plug/unplug a USB device…in the case of AER being available…you might see some error counts go up…or you might see the pointer to first error remain NULL…you could rule out PCIe as the problem in that case.

Before mentioning USB testing understand that there is a “hot plug” mechanism for drivers of some hardware classes, but not for others. Assuming PCIe functions as intended and that drivers are present, I suspect this hot plug behavior is indirectly part of the problem you are seeing. If a device can be connected to a running Jetson and drivers know to load (or unload) due to a connect/disconnect event broadcast, then you have working hot plug and success or failure only depends on whether you have a driver for the hardware (hot plug might announce new hardware, but if no driver exists, then the driver can’t load). The PCIe is not hot plug for a Jetson (L4T in general, not just a specific carrier board), but USB is hot plug. The USB root HUB itself has to be visible through PCIe without any PCIe hot plug support…should PCIe lose track of the USB root HUB after initializing, then the USB root HUB will be lost (PCIe only notices and broadcasts what’s on the bus when it initializes…later hot plug events are invisible and without a change broadcast mechanism…failed USB hot plug behavior could really be PCIe losing the HUB).

So first do as you have and see if lspci shows the root HUB from the mini-PCIe slot. Use dmesg and lspci to see that not only does PCIe see a device, but that a driver was found specific to that chip set. Then use lsusb (including “lsusb -t”) with dmesg and see if plug/unplug events show up for USB devices. Check first with common USB devices not needing special drivers, e.g., keyboard or mouse. Especially important: Under what conditions does “lsusb -t” show one of these root HUBs, and under which conditions does it not show such a HUB? Consider using “sudo dmesg --follow” on a terminal as you try things.

Hi linuxdev,
thank you for the very valuable information and hints! I didn’t know about AER, but after your message, I investigated more in deep PCIe status registers.

Just to exclude any problem with USB device driver in every test I have always used a very simple mouse and/or keyboard.

  1. The adapter I'm talking about is this one https://www.startech.com/uk/Cards-Adapters/USB-3.0/Cards/2-Port-Mini-PCI-Express-USB-3-Adapter-Card-with-Bracket-Kit~MPEXUSB3S22B and I didn't installed any specific driver, just plug it in the mPCIe slot of the carrier board. Can you suggest any TX1 compatible product (maybe with extended range temperature) we can buy?
  2. lsusb shows in both case the USB HUBs, but dmesg does not show anything in the not working case even if I plug/unplug a device
  3. I compared the output of lspci -vvv:
    • when the adapter is working (boot without any USB device attached), AER registers does not show any error;
    • when the adapter is not working (boot with a USB device attached), AER reports an CmpltTO (Completion Timeout) Uncorrectable Error

    I report here the section of lspci output in not-working configuration which has differences with the working case (underlined flags are the one that changes between the two cases)

    ...
    DevSta:	<u>CorrErr+</u> <u>UncorrErr+</u> FatalErr- UnsuppReq- AuxPwr+ TransPend-
    ...
    UESta:	DLP- SDES- TLP- FCP- <u>CmpltTO+</u> CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
    CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- <u>NonFatalErr+</u>
    CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
    
  4. I compared the output of lsusb -vv:
    • without any USB device inserted, there is no difference between the outputs in working/not-working cases;
    • attaching a USB device, I noticed a difference on Hub Port Status entry (beside the not listed device)
    ...
    Hub Descriptor:
      bLength               9
      bDescriptorType      41
      nNbrPorts             2
      wHubCharacteristic 0x0009
        Per-port power switching
        Per-port overcurrent protection
        TT think time 8 FS bits
      bPwrOn2PwrGood       10 * 2 milli seconds
      bHubContrCurrent      0 milli Ampere
      DeviceRemovable    0x00
      PortPwrCtrlMask    0xff
     Hub Port Status:
       Port 1: 0000.0503 highspeed power enable connect // IN NOT WORKING CASE it changes to "Port 1: 0000.0101 power connect"
       Port 2: 0000.0100 power
    Device Status:     0x0001
      Self Powered
    ...
    
  5. I was already keeping an eye on syslog with dmesg --follow, but what I found is that in the not working case, lsusb -vv shows some output and when it has to output the USB 2.0 HUB where I attached a device, it stops for a second and then continue; at the same time dmesg shows
    xhci_hcd 0000:01:00.0: Timeout while waiting for a slot
    hub 1-0:1.0: couldn't allocate port 1 usb_device
    
  6. We bought this mPCI USB3 adatper, but we are looking to buy another one, maybe with extended range / industrial standard, can you suggest any compatible product?

Thank you for the valuable support!

I have no experience with mini-PCIe (or even full-sized PCIe) slot USB cards, so I have no advice on one to try. Perhaps someone reading this can give details of a known working card (even if the card is for the wrong slot one could look at the chip set and get a good idea of other cards using the same chip set with different connector).

Do keep in mind that a Jetson (and perhaps your carrier board) with nothing connected to anything USB still has two root HUBs showing up. If you are absolutely certain that you see a root HUB from the mini-PCIe unit via lsusb in any form (and because USB can work in the right circumstances I believe this must be so), then it implies a driver sees the HUB (which would imply PCIe sees it and that a USB driver took ownership of the HUB). There is actually an optional D+/D- pair associated with a mini-PCIe slot, and so a card plugged in there could in reality be using the built in Tegra USB D+/D- and complicate things, but I doubt this in your case (you couldn’t get USB3 via that optional D+/D- because it is USB2-only, and PCIe would not need to function for this to work).

From what I see it appears that the initial PCIe setup must be complete before the USB chip set of your add-on card can function correctly (this is expected), and that any attempt to use that USB chip set prior to PCIe being enumerated puts the USB chip set in an unrecoverable state (this is not expected). Without a method to re-enumerate the PCIe bus (hotplug PCIe is optional and not implemented on a Jetson) I don’t know how to get around this chip set being in an invalid state other than rebooting the whole computer (if PCIe hot plug were implemented then I think USB would behave correctly over the mini-PCIe bus for this chip set).

My only advice would be try other mini-PCIe USB add-on cards…you won’t know how they work until you see if they can handle the situation that this particular card fails with. There have also been some conversations regarding hot plug of non-root PCIe end points whereby a patch can allow delayed startup of non-root PCIe nodes…this could work for you but I don’t know which forum thread had the patch for non-root PCIe delayed enumeration.

We have ordered a different mPCI-e USB3 adapter, I’ll give it a try and then I will give updates here.

Thank you linuxdev!

I tried a different mini-PCIe USB3 card, but it has the same problem.

Can you help me with the “delayed PCI-e enumeration” patch, so I can check if that can solve?
How can I ask officially/formally support from nVidia?

Thank you

This thread may be of use. Rescan might be considered a manual version of hotplug (whether or not this helps I don’t know…give it a try with the patch in the thread):
https://devtalk.nvidia.com/default/topic/997845/force-rescan-of-pcie-bus-/?offset=6