PCIe memory bus prioritization over GPU

Hi,
I have an AGX Orin 32GB with a PCIe Gen3 x8 device (a FPGA capable of ~8GB/s) connected to PCIe controller 5. This device is streaming data via external DMA into the Orin’s RAM at a configurable constant rate up to 4GB/s. This works just fine until the GPU is utilized and the GPU starts requiring EMC bandwidth as well. At this point, the EMC is only ~50% loaded according to tegrastats. However, with GPU reads/writes present as well, the FPGA writes are continually delayed and our buffers overrun in the FPGA.

We have already forced the EMC clock to be fixed at 3200MT/s, but that does not resolve the issue. What I believe we need is one of:

  • A way to prioritize the memory bus traffic from this PCIe device to always be above that of the GPU
  • A way to limit the maximum latency of write transactions from this PCIe device so that they cannot be starved by the GPU
  • A way to reserve isochronous EMC bandwidth for this device

I see that all 3 of these may be possible by looking at some older kernel drivers as well as some older forum posts:

However, it appears that there is no Linux driver for LA or PTSA configuration on Orin, and these registers are not documented in the Orin TRM. Could you please provide assistance? Thanks.

What is the result of tegrastats when this issue happened?

Sure, here is an instance of tegrastats when attempting to stream 3GB/s from the FPGA while performing 1 million point FFTs using cuFFT in a loop, which reproduces the issue:
RAM 7649/30603MB (lfb 4516*4MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@1728,0%@1728,50%@1804,0%@1804] EMC_FREQ 53%@3199 GR3D_FREQ 97%@[611,611] VIC_FREQ 729 APE 174 CV0@-256C CPU@54.75C Tboard@43C SOC2@51.187C Tdiode@45C SOC0@50.75C CV1@-256C GPU@51.81C tj@54.75C SOC1@51.406C CV2@-256C VDD_GPU_SOC 16113mW/15274mW VDD_CPU_CV 711mW/866mW VIN_SYS_5V0 12656/12248mW NC 0mW/0mW VDDQ_VDD2_1V8A0 5617mW/5346mW NC 0mW/0mW

Hi, Is there any update from NVIDIA on this? Thanks.

Hi

Sorry for late reply.

Just confirming with our pcie team. There is no way for PCIe driver to do this.

PCIe is not isochronous client, so there are no options to reduce latency or prioritize memory traffic from PCIe device.

Hi,
Even if it cannot be an isochronous client, can the MC subsystem PTSA or LA registers be modified to increase the arbitration priority of this PCIe client as was done on previous Tegra generations? Alternatively, can we lower the priority of the GPU?

I see this header file in the kernel source with register offsets in the MC perhipheral for PTSA and LA: linux-tegra-5.10/nvidia/include/linux/platform/tegra/mc-regs-t23x.h at oe4t-patches-l4t-35.5.0 · OE4T/linux-tegra-5.10 · GitHub . Therefore this functionality looks like it may still exist. However, there are 33 registers related to PCIE5 with device prefixes such as PCIEBX, PCIE5XA, PCIE5X, PCIE5_0, PCIE5_1, and PCIE5B_0 so I’m not sure which ones are applicable to reads and writes initiated from PCIe.

Thanks.

bump so this doesn’t auto close