PCIe read failures after enumeration on AGX Xavier

We have a Jetson AGX Xavier on a 3U VPX (Wolf 12TP) connected to an FPGA on another 3U VPX card. We have instantiated a Xilinx XDMA endpoint on the FPGA, as well as an example from the FPGA vendor. The device will enumerate on the bus, and lspci reports we are getting a gen3x4 connection. However, when we attempt to read any address for any BAR, we receive back all f’s. We have sent the binary of the vendor example to the FPGA manufacturer. They can get this to work if they use a different SBC as the host platform. We currently have JetPack 4.4.

We have tried to enable PCIe AER. We don’t see anything in the system log (dmesg). However, if we use PCIe configuration accesses to unmask the UEMsk and CEMsk registers, we do see what we think is an Unsupported Request error, and looking at the first four words of the TLP, we believe we see a valid TLP access attempt. Since this build does work using a different SBC, we do believe the firmware is good and there is something on the Xavier side that is causing the issue.

There is also a red LED that comes on for the board during boot on the Xaiver board. We are being told that this LED indicates either an over temp or PCIe failure condition. We have ruled out over temp. We have not determined what the PCIe failure could be. We are told this is coming from the FATAL_ERR# signal of the PLX 8718 switch. This comes on about 18 seconds after power is applied. Yet, even though there is an error, the PCIe endpoint enumerates, and supposedly trains to gen3x4. If we remove the FPGA from the chassis and power on the AGX by itself, the LED does not come on.

Are there any insights into how we can determine what could be causing this error condition for the LED? We are assuming at this point this is related to our inability to access the BAR regions.

1 Like

We have later release 4.6.3 and 5.1.1, You may consider upgrade and give it a try. Since it a can be enumerated successfully in booting, it seems like it may require additional driver for the device. This would need other users to check and share experience.

Are there any limitations with the BARs? I’ve seen mention in other posts about a 30 or 32MB limit. We are not anywhere near this, but we did have to enable 64-bit addressing. When we had the BARs as 32-bits, the OS was incorrectly assigning an address that was more than 32-bits and lspci showed the BAR as virtual. After going to a 64-bit BAR, we get a valid address and we can enable them. When we unmask AER register bits and do a memory access, the AER log shows what looks to be a valid TLP request to the device.

This post looks similar to the one here.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.