UEFI ASSERT fail with PCIe link connected on custom baseboard, works with Xavier

Hello
I have a problem with Orin on our custom baseboard. The baseboard connects an FPGA with PCIe to the Jetson. The same system works fine with AGX Xavier (JetPack 4.6.1).
With Orin, The UEFI fails with an ASSERT.

ASSERT [PciHostBridgeDxe] /home/mandre/dev/nvidia-uefi/edk2/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c(879): (Translation & Alignment) == 0

Please find attached the full log (re-compiled UEFI in Debug mode).
The FPGA device exposes 2 bars (2GB and 128B, both 32-bit non-prefetch). As soon as I erase the FPGA (meaning no PCIe endpoint shown), the system boots fine.
I have JetPack 5.0.1_DP. While I had applied several changes to the device-tree, I have reverted them all to confirm that the problem is independent of those.

Are there any known (new) limitations with Jetson Orin (vs. Xavier)?
Thank you! Marc
uefi_failure.log (77.0 KB)

Is jetson still running as PCIe RP?

I have not changed it to EP. Here is my full config for the flash.sh (note that I run it in 40W mode):

source “${LDK_DIR}/p3701.conf.common”;
BPFDTB_FILE=tegra234-bpmp-3701-0000-as-3701-0004-3737-0000.dtb;
DTB_FILE=tegra234-p3701-0000-as-p3701-0004-p3737-0000.dtb;
TBCDTB_FILE=tegra234-p3701-0000-as-p3701-0004-p3737-0000.dtb;
EMMC_BCT=tegra234-p3701-0000-p3737-0000-TE990M-sdram.dts;
EMMC_CFG=flash_t234_qspi_sdmmc.xml;
WB0SDRAM_BCT=“tegra234-p3701-0000-p3737-0000-TE990M-wb0sdram.dts”;
ODMDATA=“gbe-uphy-config-0,hsstp-lane-map-3,nvhs-uphy-config-0,hsio-uphy-config-0”;

Also, the I have

cvb_eeprom_read_size = <0x0>;

as I don’t have a EEPROM on our base board (according to previous posts).

Thanks, Marc

Hi,

Just to clarify, so if you use same device tree configuration but without any PCIe (FPGA) connected, then this issue won’t happen?

Correct.

Is this on which PCIe controller?

It is on PEX5

Could you clarify which one you are using ?

image

We use PCIe x8 (C5)
image

As mentioned the same system works with the Jetson AGX Xavier.

No need to keep saying that system can work fine with AGX Xavier. That may not be important.

Is it possible to move your test over Orin devkit and reproduce issue?

Hi,

Per checked the UEFI source, there is a limitation in where you hit. Please check if that is for your case, since your (Translation & Alignment) == 0

 Translation = GetTranslationByResourceType (RootBridge, Index);
              if ((Translation & Alignment) != 0) {
                DEBUG ((
                  DEBUG_ERROR,
                  "[%a:%d] Translation %lx is not aligned to %lx!\n",
                  __FUNCTION__,
                  DEBUG_LINE_NUMBER,
                  Translation,
                  Alignment
                  ));
                ASSERT ((Translation & Alignment) == 0);
                //
                // This may be caused by too large alignment or too small
                // Translation; pick the 1st possibility and return out of resource,
                // which can also go thru the same process for out of resource
                // outside the loop.
                //

I could isolate the problem. The current configuration of Orin doesn’t support BAR’s which are larger than 128MB.
Your ranges are defined as follows:

ranges = <0x81000000 0x00 0x3a100000 0x00 0x3a100000 0x0 0x00100000 /* downstream I/O (1MB) /
0x82000000 0x00 0x40000000 0x2b 0x28000000 0x0 0x08000000 /
non-prefetchable memory (128MB) /
0xc3000000 0x27 0x40000000 0x27 0x40000000 0x3 0xe8000000>; /
prefetchable memory (16000MB) */

For now I have applied the address translations as used in Xavier:

ranges = <0x81000000 0x0 0x3a100000 0x0 0x3a100000 0x0 0x00100000 /* downstream I/O (1MB) /
0xc3000000 0x1c 0x00000000 0x1c 0x00000000 0x3 0x40000000 /
prefetchable memory (13GB) /
0x82000000 0x0 0x40000000 0x1f 0x40000000 0x0 0xC0000000>; /
non-prefetchable memory (3GB) */

I can now boot and the device enumerates. I could not test the functionality yet, as I am debugging other issues.
Can you help me confirm that the address translation from Xavier is ok or suggest different translations?

Wayne,
It would be great if you could help on the correct address translation for the PCIe devices. I don’t have the full overview of the address space.
With my modifications as above, the system can boot and I can enumerate the PCIe device. As soon as I try to communicate with the PCIe device (write transaction), the kernel locks up with:

ERROR: RAS Uncorrectable Error in SCC, base=0xe019000:
ERROR: Status = 0xe400090d
ERROR: SERR = Illegal address (software fault): 0xd
ERROR: IERR = Address Range Error: 0x9
ERROR: MISC0 = 0x6000000
ERROR: MISC1 = 0xc4a91
ERROR: MISC2 = 0x0
ERROR: MISC3 = 0x0
ERROR: ADDR = 0x8000001f80006040
ERROR: **************************************
EERRR:: sddi__iisaatc_eeenntrreurred 11

Thank you

When I (temporary) reduce the BAR size on the FPGA and revert the ranges definition in the device tree, the PCIe transaction works fine. Thus, the address translation needs to be fixed.

Can you please help me on this? I don’t have the full overview of the address space for the Orin.
Thank you, Marc

I have switched to this mapping:

ranges = <0x81000000 0x0 0x3a100000 0x0 0x3a100000 0x0 0x00100000 /* downstream I/O (1MB) /
0x82000000 0x0 0x40000000 0x2a 0x40000000 0x0 0xC0000000 /
non-prefetchable memory (3GB) /
0xc3000000 0x27 0x40000000 0x27 0x00000000 0x3 0x40000000>; /
prefetchable memory (13GB) */

I now can talk to the card and basic transactions don’t seem to cause problems. When I then start my DMA transfers, I get CBB errors (either immediately or after a few seconds of full operation).

[ 81.412013] pcieport 0005:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[ 81.412323] pcieport 0005:00:00.0: device [10de:229a] error status/mask=00000041/0000e000
[ 81.412569] pcieport 0005:00:00.0: [ 0] RxErr (First)
[ 81.412775] pcieport 0005:00:00.0: [ 6] BadTLP
[ 81.565022] CPU:0, Error:CBB-EN@0x13a00000,irq=21
[ 81.565155] **************************************
[ 81.565289] * For more Internal Decode Help
[ 81.565411] * http://nv/cbberr
[ 81.565507] * NVIDIA userID is required to access
[ 81.565635] **************************************
[ 81.565772] CPU:0, Error:CBB-EN, Errmon:2
[ 81.565894] Error Code : TIMEOUT_ERR
[ 81.566007] Overflow : Multiple TIMEOUT_ERR
[ 81.566139] First logged Err Code : TIMEOUT_ERR
[ 81.566277] MASTER_ID : CCPLEX
[ 81.566372] Address : 0x3a000814
[ 81.566471] Cache : 0x0 – Device Non-Bufferable
[ 81.566605] Protection : 0x2 – Unprivileged, Non-Secure, Data Access
[ 81.566792] Access_Type : Read
[ 81.566793] Fabric : CBB
[ 81.566957] Slave_Id : 0x16
[ 81.567166] Burst_length : 0x0
[ 81.567660] Burst_type : 0x1
[ 81.568157] Beat_size : 0x2
[ 81.568621] VQC : 0x0
[ 81.569042] GRPSEC : 0x7e
[ 81.569506] FALCONSEC : 0x0
[ 81.569967] CBB_SN_PCIE_C5_SLV_TIMEOUT_STATUS : 0x1
[ 81.570728] **************************************
[ 81.575136] WARNING: CPU: 0 PID: 184 at /home/mandre/nvidia/nvidia_sdk/JetPack_5.0.1_DP_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/sources/kernel/nvidia/drivers/platform/tegra/cbb/tegra23x_cbb.c:541 tegra234_cbb_error_isr+0x150/0x1c8

I have now realized that also with the reduced BAR size, I am getting the same stability issues.
I am separating the topics. I have started a separate topics for the stablity issues / CBB errors:

On this topics, I would appreciate if you could suggest or confirm the adjustment of the address translation to work with larger BARs (2GB).

We can’t use Xavier’s ranges for Orin as is.
Instead, Orin’s ranges can be adjusted to have higher apertures for non-prefetchable BARs.
Please use the following adjusted ranges property and see if it works for your FPGA-based endpoint device.

BTW, it is rare to see devices with huge 32-bit Non-Prefetchable BARs. Is there any specific reason why this particular device needs to have such a huge 32-bit NP BAR? Can’t it have the same BAR as Prefetchable BAR?

ranges = <0x81000000 0x00 0x3a100000 0x00 0x3a100000 0x0 0x00100000   /* downstream I/O (1MB) /
		  0x82000000 0x00 0x40000000 0x2a 0x70000000 0x0 0xC0000000   / non-prefetchable memory (3072MB) /
		  0xc3000000 0x27 0x40000000 0x27 0x40000000 0x3 0x30000000>; / prefetchable memory (13056MB) */

Vidyas,
Thank you. The 32-bit is larger than really needed, but we need more than the 128MB. I am aware of other devices mapping this large memory area. As this is access to the on-chip bus, we can’t use prefetchable memory.
I will try your ranges when I am back from my vacation.
Marc

Vidyas,
Your configuration still fails with the alignment error.
I played a bit and it seems that the CPU address for the non-prefetchable memory must be aligned with 0x40000000.
I now see that my previous ranges had a problem (I missed that it should start at 0x27 40000000). As I don^t need mutch prefetchable memroy, I have further lowered the size of the prefetchable memory to make everything aligned:

ranges = <0x81000000 0x00 0x3a100000 0x00 0x3a100000 0x0 0x00100000 /* downstream I/O (1MB) /
0x82000000 0x00 0x40000000 0x2a 0x40000000 0x0 0xC0000000 /
non-prefetchable memory (3072MB) /
0xc3000000 0x27 0x40000000 0x27 0x40000000 0x3 0x00000000>; /
prefetchable memory (12288MB) */

I’m wondering where is this requirement coming from?

FWIW, your modification looks fine to me. Do you mean to say that it doesn’t work even with that? what exactly is the alignment error you are observing? Could you please paste the log?