Enabling Orin Dev Kit PCIe EP mode

Hi,

Using mmap and memcpy from main(), write from EP, read from PC RP, the results are unexpected, received data speed vs. reported send speed is very different.

EP->RP, 16KB 1000 times, 16MB overall
EP reports 0.0005s, 16/0.0005=32GB/s=240Gb/s
RP reports 2.2s, 16/2.2=7.4MB/s=60Mb/s

When data on RP is observed, it comes in blocks with jitter upto 0.5s so clearly most of the data is dropped or overwritten. On EP, 32GB/s is higher than theoretical for x1 reported by lspci LnkSta. Also there is a limit of 16384 bytes mmap size, if 32768 is used, Orin segfaults and restarts, not sure how to increase the limit?

RP->EP
EP reports 0.0005s, 16/0.0005=32GB/s=240Gb/s
RP reports 0.7s, 16/0.7=22MB/s=182Mb/s

When data on RP is observed, 182Mb/s is lower than using tvnet, and again EP 32GB/s is more than theoretical. Since there are similarly divergent results reported Performance issues of data transmission speed in PCIe EP mode, is mmap a viable option for data transfers over PCIe EP from user space to physical address space mapped for PCIe EP (RAM), and are these results due to temporary instability of PCIe driver, or even when fixed, will not be performant? If it is the latter, is writing custom kernel driver Custom Endpoint Function Driver the only option left or there are some other shortcuts there for using DMA from user space?

Thanks.

1 Like

Hi,

Based on GPCDMA memory to memory low performance it looks like user space DMA would not allow for speeds greater than 111MB/s.

Back to the PCIe/DMA test kernel driver above. EP side usage is covered in The bandwidth of of virtual ethernet over PCIe between two xaviers is low - #19 by WayneWWW, but PC RP side (also Ubuntu 20.04) needs to be build, and there are some posts that imply it is buildable, but instructions are little unclear if not conflicting.

Building kernel/nvidia/drivers/misc/tegra-pcie-ep-mem.c is implied in Xavier AGX PCIe End-Point : access to dma_alloc_coherent return in CUDA kernel and How another CPU communicate with Xavier through PCIE? (Solved) - #8 by guo.tang.
Building kernel/nvidia/drivers/pci/endpoint/functions/pci-epf-nv-test.c implied in AGX Endpoint PCIe DMA speed - #5 by jack_lan, but it is not clear what needs to be modified.

Trying to build tegra-pci-dma-test.c from kernel_src/nvbuild.sh by first changing tegra_defconfig to include CONFIG_TEGRA_PCIE_DMA_TEST=y. Only tegra-pci-dma-test.o gets created but not .ko, and also tegra-pcie-ep-mem.o does not get created. In the kernel/nvidia/drivers/misc/Makefile there is a ifdef CONFIG_ARCH_TEGRA_19x_SOC around obj-$(CONFIG_TEGRA_PCIE_EP_MEM).

If ifdef is commented out, build breaks as follows

make[1]: Leaving directory '/home/me/kernel_src/kernel_out'
  CALL    /home/me/kernel_src/kernel/kernel-5.10/scripts/atomic/check-atomics.sh
  CALL    /home/me/kernel_src/kernel/kernel-5.10/scripts/checksyscalls.sh
  CHK     include/generated/compile.h
  CC      drivers/misc/tegra-pcie-ep-mem.o
/home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-ep-mem.c: In function ā€˜init_debugfs’:
/home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-ep-mem.c:731:4: error: void value not ignored as it ought to be
  731 |  d = debugfs_create_x64("src", 0644, ep->debugfs,
/home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-ep-mem.c:736:4: error: void value not ignored as it ought to be
  736 |  d = debugfs_create_x64("dst", 0644, ep->debugfs,
/home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-ep-mem.c:741:4: error: void value not ignored as it ought to be
  741 |  d = debugfs_create_x32("size", 0644, ep->debugfs,
/home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-ep-mem.c:746:4: error: void value not ignored as it ought to be
  746 |  d = debugfs_create_x8("channel", 0644, ep->debugfs,
make[3]: *** [/home/me/kernel_src/kernel/kernel-5.10/scripts/Makefile.build:281: drivers/misc/tegra-pcie-ep-mem.o] Error 1
make[2]: *** [/home/me/kernel_src/kernel/kernel-5.10/scripts/Makefile.build:498: drivers/misc] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [/home/me/kernel_src/kernel/kernel-5.10/Makefile:1854: drivers] Error 2
make: *** [Makefile:213: __sub-make] Error 2

If kernel/nvidia/drivers/misc/Makefile is changed to

obj-m +=tegra-pcie-dma-test.o
all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

and invoked standalone, it has following errors

CPATH=/home/me/kernel_src/kernel/nvidia/include make
make -C /lib/modules/5.4.0-126-generic/build M=/home/me/me/kernel_src/kernel/nvidia/drivers/misc modules
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[1]: Entering directory '/usr/src/linux-headers-5.4.0-126-generic'
  CC [M]  /home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-dma-test.o
  Building modules, stage 2.
  MODPOST 1 modules
ERROR: "tegra_pcie_edma_initialize" [/home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-dma-test.ko] undefined!
ERROR: "tegra_pcie_edma_submit_xfer" [/home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-dma-test.ko] undefined!
ERROR: "tegra_pcie_edma_deinit" [/home/me/kernel_src/kernel/nvidia/drivers/misc/tegra-pcie-dma-test.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
make[1]: *** [Makefile:1675: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-126-generic'
make: *** [Makefile:3: all] Error 2

If Makefile is changed to

obj-m +=tegra-pcie-ep-mem.o
all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

and invoked standalone, .ko is created, once loaded with insmod, modinfo lists it but lsmod lists as unused and don’t see anything new under /sys/kernel/debug.

Not sure which of these modules needs to be build and how to properly change make files, could you please provide some guidance?

Thanks.

Hi,

Is there possibly a dependency on the PC RP OS or Cuda version, using 20.04 and 11.4? Based on a somewhat related project only in a sense of exposing GPU kernel driver DMA to another device and to user space app, seems to have compatibility with earlier Ubuntu but incompatibility with newer, GPUDirect RDMA on NVIDIA Jetson AGX Xavier driver build issue. Following the prerequisite steps including nvidia-dkms- and Building on an x86 Linux PC, to Run on That PC, on one system there are some missing dependencies:

./build-for-pc-native.sh
./nvidia-ko-to-module-symvers "/lib/modules/5.15.0-57-generic/updates/dkms/nvidia.ko" "Module.symvers"
make -C "/lib/modules/5.15.0-57-generic/build" "M=$PWD" "modules"
make[1]: Entering directory '/usr/src/linux-headers-5.15.0-57-generic'
  MODPOST /home/me/jetson-rdma-picoevb/kernel-module/Module.symvers
ERROR: modpost: "nvidia_p2p_get_pages" [/home/me/jetson-rdma-picoevb/kernel-module/picoevb-rdma.ko] undefined!
ERROR: modpost: "nvidia_p2p_dma_map_pages" [/home/me/jetson-rdma-picoevb/kernel-module/picoevb-rdma.ko] undefined!
ERROR: modpost: "nvidia_p2p_dma_unmap_pages" [/home/me/jetson-rdma-picoevb/kernel-module/picoevb-rdma.ko] undefined!
ERROR: modpost: "nvidia_p2p_put_pages" [/home/me/jetson-rdma-picoevb/kernel-module/picoevb-rdma.ko] undefined!
ERROR: modpost: "nvidia_p2p_free_page_table" [/home/me/jetson-rdma-picoevb/kernel-module/picoevb-rdma.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:133: /home/me/jetson-rdma-picoevb/kernel-module/Module.symvers] Error 1
make[2]: *** Deleting file '/home/me/jetson-rdma-picoevb/kernel-module/Module.symvers'
make[1]: *** [Makefile:1817: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.15.0-57-generic'
make: *** [Makefile:18: modules] Error 2

On another systems, something with kernel symbol file:

./build-for-pc-native.sh
./nvidia-ko-to-module-symvers "/lib/modules/5.4.0-126-generic/kernel/drivers/video/nvidia.ko" "Module.symvers"
make -C "/lib/modules/5.4.0-126-generic/build" "M=$PWD" "modules"
make[1]: Entering directory '/usr/src/linux-headers-5.4.0-126-generic'
  CC [M]  /home/me/jetson-rdma-picoevb/kernel-module/picoevb-rdma.o
  Building modules, stage 2.
  MODPOST 1 modules
FATAL: parse error in symbol dump file
make[2]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
make[1]: *** [Makefile:1675: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-126-generic'
make: *** [Makefile:18: modules] Error 2

Can not get either tegra-pcie-ep-mem.c for remote Orin or picoevb-rdm.c for local GPU to build and list anything under /dev/. Any guidance on either one?

Thanks.