USB 3.0 PCIe card not working

Hello,

We have multiple PCIe cards that we want to connect to AGX Orin

  • PEXUSB3S44V revision 6X2B
  • PEXUSB3S44V revision 7X2B
  • PX-UC-86261
    however only first one is working. Cards were tested on different machines and systems and work just fine. Problem only appears on Orin.

We tried couple of solutions found on forum:

  • pcie_aspm=off in /boot/extlinux/extlinux.conf - no visible effect.
  • pci=nommconf in /boot/extlinux/extlinux.conf - no visible effect.
  • pex_perst=0 in /boot/extlinux/extlinux.conf - no visible effect.
  • pcie_nomsi=off in /boot/extlinux/extlinux.conf - seem to remove some errors from dmesg, but thats it.
  • limiting PCIe speed - sudo fdtput -t i /boot/dtb/kernel_tegra234-p3701-0000-p3737-0000.dtb /pcie@141a0000 nvidia,max-speed 1 - no visible effect.

Any help with resovling issue will be appreciated.

I attach outputs from dmesg -vvv and lspci -vvv:
lspci.txt (76.6 KB)
dmesg.txt (104.4 KB)

Sorry for the late response.
Is this still an issue to support? Any result can be shared?

Yes, we still did not resolve this issue.

Is the connection issue on NV devkit or custom board?

We tested connection using both Orin devkit and and Avermedia D315 carrierboard with similar results, that is pcie card is not working on any Orin. I attach results of lspci -vvv and dmesg on devkit.
lspc-devkit.txt (57.5 KB)
dmesg-devkit.txt (84.8 KB)

Hi,

PCIe controller already detects the card you are using.

The issue is not from pcie driver now. Please check with card vendor for the driver requirement they need.

I just want to add some detail…

This says that PCI is working ok since the first error report is NULL (it’s a linked list):
AERCap: First Error Pointer: 00

It also says a driver loaded:
Kernel driver in use: xhci_hcd

From the PCI end all is good. The rest of the error is on dmesg, but I couldn’t tell you what the issue is. Does this particular PCIe card have a firmware update ability?

Yes, we tried it, but to no avail. I attach dmesg log from when card with updated firmware was inserted.
dmesg-updated-firmaware.txt (122.6 KB)

I can’t see the exact issue, but I have some observations (which might or might not be related)…

The failure occurs in a kernel driver while handling a hardware IRQ, apparently for a GPIO pin. Generally speaking, GPIO modes are set via device tree, although they can be changed in other ways too. If this has a custom carrier board, and not the stock developer kit, then the device tree likely needs to change. If this is not a dev kit, then the fix might be as simple as using the correct device tree.

Software you are using might also require a change for your specific hardware (it would be a firmware change; the device tree is firmware that is passed to the driver as it loads, and in this case it is GPIO hardware being dedicated to this PCIe device).

Is this an actual dev kit? Does the PCIe card have any software/firmware associated with it (firmware not aware of the specific Jetson device could be wrong and in need of edit)? Was device tree or firmware ever altered?

We use both devkit and custom carrier board, specifically Avermedia D315. The latter has it’s own BSP available. all logs provided come from Orin on D315, however, we get the same behavior on the dev kit.

When using the dev kit are you certain you are using the device tree specific to the dev kit, and not the Avermedia carrier board? Also, is all software involved purely user space? What software was installed specifically for this USB hardware? Is the software added compiled from source code, or is it binary format? Is there any chance that this software was built for different 64-bit ARM, e.g., an RPi?

Yes, we have more than one Orin, so to compare results I can just take a card from one board to another.

The card is supposed to work out of the box, no specific software was installed, apart from failed attempt to upgrade firmware.

Describe this card…is it SD? Something custom? I ask because the AGX Orin dev kit has eMMC memory, and any boot to SD would require significant changes to the boot firmware and initrd. Also, in the dev kit one, can you post the output of:
lsblk -f

And also from:
df -h -T -t ext4

Finally, what is your L4T release (this is what you call Ubuntu after NVIDIA drivers are added)? See:
head -n 1 /etc/nv_tegra_release

Related, what do you see from:
cat /etc/nv_boot_control.conf

There is a gray area I’m interested in because there is both eMMC memory and QSPI memory (the QSPI is on the module itself). A purely SD card model would use the QSPI for boot content, and for the equivalent of a BIOS (there is no actual BIOS, this is why Jetsons cannot self-flash). An eMMC model would put that content in signed partitions. There may be some crossover involved, and when booted, it might be nice to have the o/s itself say what is there.

One an SD card model that has no eMMC available one can flash the QSPI separately from the SD card. A “normal” flash of an eMMC model dev kit (including AGX Orin) would simultaneously flash rootfs and a lot of other partitions.

It’s this card: 4 Port PCIe USB 3.0 Card w/ 4 Channels - USB 3.0 Cards | Add-on Cards & Peripherals | StarTech.com I don’t think there is anything special about it, the only interesting thing is that it has separate usb channels, so it can run 4 usb 3.0 at about full speed.
Here are outputs
lsblk -f:

root@orin-02:/home/orin# lsblk -f
NAME         FSTYPE LABEL      UUID                                 FSAVAIL FSUSE% MOUNTPOINT
loop0        vfat   L4T-README 1234-ABCD                                           
mmcblk0                                                                            
├─mmcblk0p1  ext4              c58f73c2-2a6d-4299-a470-904948ddf5d3   30.2G    42% /
├─mmcblk0p2                                                                        
├─mmcblk0p3                                                                        
├─mmcblk0p4                                                                        
├─mmcblk0p5                                                                        
├─mmcblk0p6                                                                        
├─mmcblk0p7                                                                        
├─mmcblk0p8                                                                        
├─mmcblk0p9                                                                        
├─mmcblk0p10 vfat              DAE6-5799                                           
├─mmcblk0p11                                                                       
├─mmcblk0p12                                                                       
├─mmcblk0p13                                                                       
├─mmcblk0p14                                                                       
└─mmcblk0p15                                                                       
zram0                                                                              [SWAP]
zram1                                                                              [SWAP]
zram2                                                                              [SWAP]
zram3                                                                              [SWAP]
zram4                                                                              [SWAP]
zram5                                                                              [SWAP]
zram6                                                                              [SWAP]
zram7                                                                              [SWAP]
zram8                                                                              [SWAP]
zram9                                                                              [SWAP]
zram10                                                                             [SWAP]
zram11                                                                             [SWAP]
nvme0n1                                                                            
└─nvme0n1p1  ext4              b0b0fc42-e673-4e22-bd5d-21eab17cdb6b    1.5T    12% /opt/2TB

df -h -T -t ext4:

root@orin-02:/home/orin# df -h -T -t ext4
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4   57G   24G   31G  44% /
/dev/nvme0n1p1 ext4  1.8T  222G  1.5T  13% /opt/2TB

head -n 1 /etc/nv_tegra_release:

root@orin-02:/home/orin# head -n 1 /etc/nv_tegra_release
# R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug  1 19:57:35 UTC 2023

cat /etc/nv_boot_control.conf:

root@orin-02:/home/orin# cat /etc/nv_boot_control.conf
TNSPEC 3701-500-0000-K.0-1-1-jetson-agx-orin-devkit-
COMPATIBLE_SPEC 3701-300-0000--1--jetson-agx-orin-devkit-
TEGRA_LEGACY_UPDATE false
TEGRA_BOOT_STORAGE mmcblk0
TEGRA_EMMC_ONLY false
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

I see ext4 eMMC as the rootfs. Originally I thought maybe “card” referred to an SD card with the o/s, so you can ignore my questions on that. You have a very standard eMMC install and any extra storage on /opt won’t have any effect on this. Device tree won’t matter.

I highly suspect that a desktop PC would have the driver for this, but not for a Jetson. Jetsons are embedded devices that don’t include all of the modules for drivers that you would find on a PC (you’d fill up the storage with drivers to things you don’t have). The PCIe itself is functioning correctly. It confuses me though that I see this if it isn’t working:
Kernel driver in use: xhci_hcd

What that tells me is that you do have both PCI and USB drivers. If you boot this normally, or with the pcie_aspm=off, and then monitor “dmesg --follow”, what do you see when you plug in some low power USB device? This would be ideal if you were to plug in a mouse or keyboard and see what shows up on “dmesg --follow”. Try all USB ports of that card.

I did what you asked. Both with and without pcie_aspm=off literally nothing new showed up in dmesg log when i plugged mouse to any of 4 ports.

In the original “sudo lspci -vvv”, was this log taken before trying to plug in any USB devices? I could see the possibility that there would be no errors listed if no data transfer had been attempted. It is very very odd that you would see nothing at all in dmesg from a plug-in event when you have the xhci_hcd driver loaded.

I did however think of one other possibility: Mice and keyboards are USB 1.0 or 1.1, and not USB 3.x. Technically, the xhci driver is for superspeed (USB 3) host mode. Try “dmesg --follow” again, but try looking for any logging which is added due to plug-in of USB 3 devices. I imagine you have a USB 3 device you were trying to use this with originally.

Some background as to why I’m talking about this latter USB 3 and xhci_hcd: As USB evolved, the older standards were supported by newer USB root HUBs. USB got to the point where a USB 2 controller supported both USB 1.1 and USB 1.0. When USB 3 came along this changed, and instead of a single chip supporting all modes, there is a separate USB 3 and USB legacy controller. The logic detects which is plugged in, and routes the data lanes differently to either the USB 3 controller or the USB legacy (2.0 and earlier) controller. I suppose it is possible your wiring or device tree might not route correctly to reach a mouse or keyboard, and thus would never log. If this is the case though, then we can still see logging from superspeed devices.

If USB 3 does show anything upon plug-in, then we can figure out why USB legacy detection is not present. A dev kit would normally not see this issue, and the normal cause is a device tree setting; other third party carrier boards tend to need different wiring and a different device tree. Do USB 3 devices show an event on plugin?

After testing for plugin events, also run “sudo lspci -vvv” again and see if that PCIe card still has an AER first pointer of “00” (it should be this if there is no error in the PHY).

sorry for late response.

after connecting and disconnecting usb 3.0 device (in this case Balluff camera) literally nothing new showed up in dmesg. Output of lspci -vvv is exactly the same before and after (checked with diff, this page even refused to upload both files.) first error pointer is 00 as you expected.
lspci-after-events.txt (57.2 KB)
dmesg-devkit-usb3-connect-disconnect.txt (83.8 KB)

No log at all from a plug-in is unusual. Is there anything you can describe for this one device?

0005:01:00.0 PCI bridge: Pericom Semiconductor PI7C9X2G608GP PCIe2 6-Port/8-Lane Packet Switch (prog-if 00 [Normal decode])

Maybe @WayneWWW or @kayccc can comment on what this error means for PCI on the Jetson (it might have nothing specific to do with Jetsons):

[ 1089.241536] pci 0005:01:00.0: invalid large VPD tag 7f at offset 0

(the device I asked about is the device slot tied to that message)

There is this spec I found online https://www.mouser.com/datasheet/2/115/PI7C9X2G608GP-product-brief-1140641.pdf
Apparently, this bridge is used to split PCIe into 4 independentUSBs, but other than that I don’t know anything.