We tested connection using both Orin devkit and and Avermedia D315 carrierboard with similar results, that is pcie card is not working on any Orin. I attach results of lspci -vvv and dmesg on devkit. lspc-devkit.txt (57.5 KB) dmesg-devkit.txt (84.8 KB)
This says that PCI is working ok since the first error report is NULL (it’s a linked list): AERCap: First Error Pointer: 00
It also says a driver loaded: Kernel driver in use: xhci_hcd
From the PCI end all is good. The rest of the error is on dmesg, but I couldn’t tell you what the issue is. Does this particular PCIe card have a firmware update ability?
I can’t see the exact issue, but I have some observations (which might or might not be related)…
The failure occurs in a kernel driver while handling a hardware IRQ, apparently for a GPIO pin. Generally speaking, GPIO modes are set via device tree, although they can be changed in other ways too. If this has a custom carrier board, and not the stock developer kit, then the device tree likely needs to change. If this is not a dev kit, then the fix might be as simple as using the correct device tree.
Software you are using might also require a change for your specific hardware (it would be a firmware change; the device tree is firmware that is passed to the driver as it loads, and in this case it is GPIO hardware being dedicated to this PCIe device).
Is this an actual dev kit? Does the PCIe card have any software/firmware associated with it (firmware not aware of the specific Jetson device could be wrong and in need of edit)? Was device tree or firmware ever altered?
We use both devkit and custom carrier board, specifically Avermedia D315. The latter has it’s own BSP available. all logs provided come from Orin on D315, however, we get the same behavior on the dev kit.
When using the dev kit are you certain you are using the device tree specific to the dev kit, and not the Avermedia carrier board? Also, is all software involved purely user space? What software was installed specifically for this USB hardware? Is the software added compiled from source code, or is it binary format? Is there any chance that this software was built for different 64-bit ARM, e.g., an RPi?
Describe this card…is it SD? Something custom? I ask because the AGX Orin dev kit has eMMC memory, and any boot to SD would require significant changes to the boot firmware and initrd. Also, in the dev kit one, can you post the output of: lsblk -f
And also from: df -h -T -t ext4
Finally, what is your L4T release (this is what you call Ubuntu after NVIDIA drivers are added)? See: head -n 1 /etc/nv_tegra_release
Related, what do you see from: cat /etc/nv_boot_control.conf
There is a gray area I’m interested in because there is both eMMC memory and QSPI memory (the QSPI is on the module itself). A purely SD card model would use the QSPI for boot content, and for the equivalent of a BIOS (there is no actual BIOS, this is why Jetsons cannot self-flash). An eMMC model would put that content in signed partitions. There may be some crossover involved, and when booted, it might be nice to have the o/s itself say what is there.
One an SD card model that has no eMMC available one can flash the QSPI separately from the SD card. A “normal” flash of an eMMC model dev kit (including AGX Orin) would simultaneously flash rootfs and a lot of other partitions.
I see ext4 eMMC as the rootfs. Originally I thought maybe “card” referred to an SD card with the o/s, so you can ignore my questions on that. You have a very standard eMMC install and any extra storage on /opt won’t have any effect on this. Device tree won’t matter.
I highly suspect that a desktop PC would have the driver for this, but not for a Jetson. Jetsons are embedded devices that don’t include all of the modules for drivers that you would find on a PC (you’d fill up the storage with drivers to things you don’t have). The PCIe itself is functioning correctly. It confuses me though that I see this if it isn’t working: Kernel driver in use: xhci_hcd
What that tells me is that you do have both PCI and USB drivers. If you boot this normally, or with the pcie_aspm=off, and then monitor “dmesg --follow”, what do you see when you plug in some low power USB device? This would be ideal if you were to plug in a mouse or keyboard and see what shows up on “dmesg --follow”. Try all USB ports of that card.
In the original “sudo lspci -vvv”, was this log taken before trying to plug in any USB devices? I could see the possibility that there would be no errors listed if no data transfer had been attempted. It is very very odd that you would see nothing at all in dmesg from a plug-in event when you have the xhci_hcd driver loaded.
I did however think of one other possibility: Mice and keyboards are USB 1.0 or 1.1, and not USB 3.x. Technically, the xhci driver is for superspeed (USB 3) host mode. Try “dmesg --follow” again, but try looking for any logging which is added due to plug-in of USB 3 devices. I imagine you have a USB 3 device you were trying to use this with originally.
Some background as to why I’m talking about this latter USB 3 and xhci_hcd: As USB evolved, the older standards were supported by newer USB root HUBs. USB got to the point where a USB 2 controller supported both USB 1.1 and USB 1.0. When USB 3 came along this changed, and instead of a single chip supporting all modes, there is a separate USB 3 and USB legacy controller. The logic detects which is plugged in, and routes the data lanes differently to either the USB 3 controller or the USB legacy (2.0 and earlier) controller. I suppose it is possible your wiring or device tree might not route correctly to reach a mouse or keyboard, and thus would never log. If this is the case though, then we can still see logging from superspeed devices.
If USB 3 does show anything upon plug-in, then we can figure out why USB legacy detection is not present. A dev kit would normally not see this issue, and the normal cause is a device tree setting; other third party carrier boards tend to need different wiring and a different device tree. Do USB 3 devices show an event on plugin?
After testing for plugin events, also run “sudo lspci -vvv” again and see if that PCIe card still has an AER first pointer of “00” (it should be this if there is no error in the PHY).
after connecting and disconnecting usb 3.0 device (in this case Balluff camera) literally nothing new showed up in dmesg. Output of lspci -vvv is exactly the same before and after (checked with diff, this page even refused to upload both files.) first error pointer is 00 as you expected. lspci-after-events.txt (57.2 KB) dmesg-devkit-usb3-connect-disconnect.txt (83.8 KB)