I’ve got a 4-port High Speed NIC (4x25G) adapter installed in the PCIe slot of my NVIDIA AGX Orin dev kit. When I’m at speeds at or below 50G things work great. When I attempt to go above those speeds, things simply plateau at 50G.
The PCIe slot is reported by lspci -vvv to be 16GT/s with x8 width. This tells me that the achievable bandwidth is 128Gbps (ideal). However, I’m only able to hit slightly less than 50Gbps when I expect around 100Gbps in my test case.
I’ve checked the ODMDATA and that configuration looks correct:
This is what I see after about 1 minute of iperf3 running where I’m hitting the 50Gbps plateau:
03-18-2026 20:29:39 RAM 1581/30611MB (lfb 13x4MB) CPU [51%@2201,52%@2201,46%@2201,42%@2201,57%@2201,58%@2201,57%@2201,52%@2201,42%@2201,52%@2201,52%@2201,50%@2201] EMC_FREQ 6%@3199 GR3D_FREQ 0%@[1300,1300] NVENC0 off NVDEC0 off NVJPG0 off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@55.281C cxgb4_0005:01:00.4@51C soc2@51.062C soc0@51.156C gpu@48.875C tj@55.281C soc1@49.656C VDD_GPU_SOC 5896mW/5896mW VDD_CPU_CV 5501mW/5501mW VIN_SYS_5V0 6593mW/6565mW
It’s the aggregate of the 4 connections I’m running. If I run one iperf3 connection, I see ~23Gbps. If I run two connections, I see ~23/23 Gbps (~46Gbps total). If I run 3 connections, I see ~15/15/15Gbps each (~45Gbps total). If I run 4 connections, I see ~11/11/11/11Gbps each (~44Gbps total). These are not exact numbers, but roughly what I’m seeing.
Each connection stream is using a different port on the adapter. Each port can run up to 25Gbps for maximum line rate. The 4 ports @ 25Gbps would give me a max of 100Gbps. This is under the PCIe rate of128Gbps (ideal @ 16GT/s x8 lanes).
FWIW, I see similar behavior with a couple of other tests outside of iperf3. I have observed it with employing iWARP RDMA Offloading where I use fio to perform writes via NFS over RDMA across the 4 connections. I have observed it with using ib_write_bw. I simply cannot get above 50Gbps.
I get the performance I expect when I run one or two connections. When I hit three or four, things simply plateau.
Is it possible UPHY1 is really configured for x4 even though x8 is being reported by lspci -vvv? Is there anything I can check, like a register configuration, to verify this is not the issue?
i have the AGX Orin as the PCIe host, and an FPGA as a PCIe endpoint, with a DMA subsystem. Orin is reporting PCIe 4.0 (16GT/s) and x8 width. testing is hitting a maximum of ~2GB/s bandwidth (or ~16Gb/s), which is far below the speeds you are seeing, but there could be something else incorrect in this setup.
if i take the FPGA endpoint to a standard desktop PC, same testing shows a 10x decrease in DMA transfer latency and a 8-10x increase in bandwidth. to me, that indicates the FPGA design is correct, and the Orin is doing something weird.
FWIW, my configuration consists of two Orin dev kits each with a high-speed 4-port NIC installed on the PCIe slot. When I try to perform iWARP RDMA comms between them (one as an NFS server and the other as a client running fio to max things out across all 4 ports), I hit that maximum of 50Gbps. If I move the server to an x86 platform with a high-speed 4-port NIC installed in it, the Orin dev kit client can hit around 88Gbps.
I haven’t quite figured out where the bottleneck is with the Orin running as a server. I’m still in the process of tracking that down.