PCIe EP/RP speedtest for virtual network and DMA

Hello,

I have successfully gotten a PCIe link working between my Orin AGX (EP) and Xavier NX M.2 port (RP). lspci enumeration, memory-mapped data transfer, and virtual Ethernet-over-PCIe are working properly.

The link is set to Speed 8GT/s, Width x4.

When I perform a speed test over this PCIe link, I get 3.33 Gbit / sec as seen here:

I am curious about two things:

  • Does this PCIe virtual network speed match NVIDIA’s tests (i.e. with the resource overhead bottleneck of TCP/IP, are my speedtest results reasonable)?
    • If the answer is “no”, what factors may improve this speed? We would like to eventually use virtualized PCIe in our application.
  • How can I perform a DMA speedtest and/or hit the max possible physical transfer rate, to compare to the theoretical max?

EP is AGX Orin, RP is Xavier NX. Both run Jetpack 5.1.2.

As a follow-up question to this:

Does NVIDIA provide a version of virtual Ethernet-over-PCIe that makes use of DMA?

It seems from various forum posts that DMA is necessary to get max PCIe throughput, but CPU is a major bottleneck because PCIe hardware IRQs on the AGX endpoint are forced through a single CPU core.

Hello, any support on this topic, especially for the DMA speed test?

I followed this topic, but can’t figure out how to see the channel, size, etc. fields under /sys/kernel/debug/pcie@141a0000/ on the EP side. They don’t appear.

To attempt to solve this, I added these fields to tegra_defconfig:

CONFIG_PCIE_TEGRA=y
CONFIG_PCIE_RP_DMA_TEST=y
CONFIG_PCIE_TEGRA_DW=y
CONFIG_PCIE_TEGRA_HOST=y
CONFIG_PCIE_TEGRA_DW_DMA_TEST=y

Then I modified the source by defining CONFIG_PCIE_TEGRA_DW_DMA_TEST in these files:
nvidia/drivers/pci/host/pcie-tegra-dw.c
nvidia/drivers/pci/dwc/pcie-tegra.c

Then, I re-compiled kernel & re-flashed both my Orin AGX EP and Xavier NX RP.

I then boot both EP and RP. lspci enumeration, memory-mapped data transfer with busybox , and virtual Ethernet-over-PCIe still work properly.

On EP, under /sys/kernel/debug/pcie@141a0000/, I now see the following:

apply_flr          
apply_sbr           
aspm_state_cnt  
ep_rid   
perf_test    
target_speed
apply_pme_turnoff  
apply_speed_change  
dma_size        
flr_rid  
sanity_test

However: When I attempt the DMA speedtest with cat perf_test, I get the following error in dmesg

[  326.096457] tegra194-pcie 14100000.pcie: edma_submit_direct_tx: DD WR CH: 0 TO
[  326.103966] tegra194-pcie 14100000.pcie: perf_test: DD WR, SZ: 267386880 B CH: 0 failed

[  326.496852] irq 65: nobody cared (try booting with the "irqpoll" option)
[  326.503794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE     5.10.120-tegra #1
[  326.503799] Hardware name: Unknown Jetson AGX Orin Developer Kit/Jetson AGX Orin Developer Kit, BIOS 4.1-33958178 08/01/2023
[  326.503806] Call trace:
[  326.503837]  dump_backtrace+0x0/0x1d0
[  326.503845]  show_stack+0x30/0x40
[  326.503870]  dump_stack+0xd8/0x138
[  326.503876]  __report_bad_irq+0x58/0xe4
[  326.503893]  note_interrupt+0x2dc/0x3a0
[  326.503913]  handle_irq_event_percpu+0x90/0xa0
[  326.503922]  handle_irq_event+0x50/0xf0
[  326.503932]  handle_fasteoi_irq+0xc0/0x170
[  326.503940]  generic_handle_irq+0x40/0x60
[  326.503947]  __handle_domain_irq+0x70/0xd0
[  326.503953]  gic_handle_irq+0x68/0x134
[  326.503957]  el1_irq+0xd0/0x180
[  326.503973]  __bitmap_and+0x1c/0x80
[  326.503983]  rebalance_domains+0x298/0x3a0
[  326.503989]  run_rebalance_domains+0x54/0x80
[  326.503993]  __do_softirq+0x140/0x3e8
[  326.504006]  irq_exit+0xc0/0xe0
[  326.504012]  __handle_domain_irq+0x74/0xd0
[  326.504016]  gic_handle_irq+0x68/0x134
[  326.504021]  el1_irq+0xd0/0x180
[  326.504038]  tick_nohz_idle_exit+0x6c/0xc0
[  326.504042]  do_idle+0x188/0x270
[  326.504047]  cpu_startup_entry+0x30/0x70
[  326.504059]  rest_init+0xdc/0xe8
[  326.504075]  arch_call_rest_init+0x18/0x20
[  326.504080]  start_kernel+0x500/0x538
[  326.504085] handlers:
[  326.506438] [<0000000080b2cb9f>] tegra_pcie_rp_irq_handler threaded [<0000000072a6e636>] tegra_pcie_rp_irq_thread
[  326.517023] [<00000000daf0649b>] pcie_pme_irq
[  326.521517] [<00000000d940327e>] aer_irq threaded [<00000000445996a2>] aer_isr
[  326.528966] Disabling IRQ #65

Subsequent runs of cat perf_test yield this in dmesg:

[ 1629.245230] tegra194-pcie 14100000.pcie: perf_test: DD WR, CH: 0 SZ: 267386880 B, time: 4795902405 ns
[ 1634.252759] tegra194-pcie 14100000.pcie: edma_submit_direct_rx: DD RD CH: 0 TO
[ 1634.260248] tegra194-pcie 14100000.pcie: perf_test: DD RD, SZ: 267386880 B CH: 0 failed
[ 1713.797575] tegra194-pcie 14100000.pcie: perf_test: DD WR, CH: 0 SZ: 267386880 B, time: 4819465563 ns
[ 1718.989272] tegra194-pcie 14100000.pcie: edma_submit_direct_rx: DD RD CH: 0 TO
[ 1718.996768] tegra194-pcie 14100000.pcie: perf_test: DD RD, SZ: 267386880 B CH: 0 failed
[ 1768.293684] tegra194-pcie 14100000.pcie: perf_test: DD WR, CH: 0 SZ: 267386880 B, time: 4769931990 ns
[ 1773.517746] tegra194-pcie 14100000.pcie: edma_submit_direct_rx: DD RD CH: 0 TO
[ 1773.525247] tegra194-pcie 14100000.pcie: perf_test: DD RD, SZ: 267386880 B CH: 0 failed

Hi @kernel_sanders

Did you also apply the patch 0001-gathered-all-dma-performance-test-patches.patch to?

Aha — I missed that step in your older thread.
Will give that a shot, hopefully that was the only missing piece.

I have applied the patch for the DMA performance test, yet when I attempt to build the kernel with nvbuild.sh I get a build error.

It appears to be due to fail_set_bar being ‘defined but not used’ in pci-epf-nv-test.c after patch is applied?

Relevant section of pci-epf-nv-test.c is here (forums won’t let me upload the full .c file), I’m pretty confident the patch is being applied properly:

...
...
	if (ret) {
		dev_err(fdev, "pci_epc_set_bar() failed: %d\n", ret);
		//goto fail_unmap_ram_virt;
		goto fail_set_bar;
		return ret;
	}
#endif

#if (LINUX_VERSION_CODE > KERNEL_VERSION(4, 15, 0))
	epf->nb.notifier_call = pci_epf_nv_test_notifier;
	pci_epc_register_notifier(epc, &epf->nb);
#endif

	return 0;

#if (LINUX_VERSION_CODE <= KERNEL_VERSION(4, 15, 0))
fail_unmap_ram_virt:
	vunmap(epfnv->bar0_ram_map);
#endif
//fail_unmap_ram_iova:
//	iommu_unmap(domain, epfnv->bar0_iova, PAGE_SIZE);
//fail_free_iova:
//	iommu_dma_free_iova(cdev, epfnv->bar0_iova, BAR0_SIZE);
//fail_free_pages:
//	__free_pages(epfnv->bar0_ram_page, 1);
//fail:
fail_set_bar:
	dma_free_coherent(cdev, BAR0_SIZE, epfnv->bar0_ram_map,
			  epfnv->bar0_iova);
	return ret;
}
...

hermes_wu has been kindly helping me out so far, would be super grateful if @WayneWWW or someone else from NVIDIA could help weigh in.

Here is the build error:

$ ./nvbuild.sh -o $KERNEL_OUT:

Building kernel-5.10 sources
make: Entering directory '/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10'
make[1]: Entering directory '/home/user/jetson_kernel/kernel_out'
  GEN     Makefile
#
# No change to .config
#
make[1]: Leaving directory '/home/user/jetson_kernel/kernel_out'
make: Leaving directory '/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10'
make[1]: Entering directory '/home/user/jetson_kernel/kernel_out'
  GEN     Makefile
make[1]: Leaving directory '/home/user/jetson_kernel/kernel_out'
  CALL    /home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10/scripts/atomic/check-atomics.sh
  CALL    /home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10/scripts/checksyscalls.sh
...
...
...
/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/nvidia/drivers/pci/endpoint/functions/pci-epf-nv-test.c: In function ‘pci_epf_nv_test_bind’:
/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/nvidia/drivers/pci/endpoint/functions/pci-epf-nv-test.c:203:1: error: label ‘fail_set_bar’ defined but not used [-Werror=unused-label]
  203 | fail_set_bar:
      | ^~~~~~~~~~~~
cc1: all warnings being treated as errors
make[5]: *** [/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10/scripts/Makefile.build:281: drivers/pci/endpoint/functions/pci-epf-nv-test.o] Error 1
make[5]: *** Waiting for unfinished jobs....
...
...
make[4]: *** [/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10/scripts/Makefile.build:498: drivers/pci/endpoint/functions] Error 2
make[4]: *** Waiting for unfinished jobs....
...
make[3]: *** [/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10/scripts/Makefile.build:498: drivers/pci/endpoint] Error 2
make[3]: *** Waiting for unfinished jobs....
...
...
make[2]: *** [/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10/scripts/Makefile.build:498: drivers/pci] Error 2
make[2]: *** Waiting for unfinished jobs....
...
...
make[1]: *** [/home/user/jetson_kernel/Linux_for_Tegra/source/public/kernel/kernel-5.10/Makefile:1854: drivers] Error 2
make: *** [Makefile:213: __sub-make] Error 2

Whoops. I didn’t notice that goto fail_set_bar is called in a section that only executes for earlier linux kernel versions. Commented out the fail_set_bar definition and kernel compiles just fine.

Applied the patch, rebuilt the kernel, flashed to both EP and RP, and am getting the same error as in this comment.

I would think how I’m applying the patch is incorrect, as I noticed that now on EP, for some reason only /sys/kernel/debug/pcie@14100000 appears instead of the expected /sys/kernel/debug/pcie@141a0000/ (which was present before any patch was applied ).

Regarding the following patch to tegra194-soc/tegra194-soc-pcie.dtsi :

+		nvidia,dma-poll;

I realized it’s not totally clear in the patch where that nvidia,dma-poll; property must be added to enable C5 DMA on RP / EP.

So, to proceed : Does anyone know whether this property should be added to device tree for pcie@141a0000/, pcie_ep@141a0000/, or both?

RP is Xavier NX’s M.2 slot, and EP is AGX Orin’s x16 slot.