AGX Orin Devkit PCIE bandwidth test

Hi Team,

We are trying to check the performance(bandwidth, latency) of PCIe C5 by DMA read and write test between two AGX Orin Devkit (L4T 35.3.1) by configuring One Jetson AGX Orin as Root Complex and another Jetson AGX Orin as Endpoint.

Is there any way to perform test to analyze bandwidth, latency and other performance metrics of PCIE port c5 in Jetson ORIN with L4T35.3.1?

We tried with the below link,
https://www.kernel.org/doc/html/v5.4/PCI/endpoint/pci-test-howto.html

With pci_epf_test function driver. We could only achieve speed mentioned below
Read : 12-15MBps
Write : 2-4 MBps
Copy : 4-5 Mbps

               WRITE => Size: 102400 bytes       DMA: YES        Time: 0.040632888 seconds      Rate: 2461 KB/s

This bandwidth is very low for PCIe GEN4. Could you suggest us way to improve bandwidth?

Hi,
Please try the method and check throughput:
The bandwidth of of virtual ethernet over PCIe between two xaviers is low - #19 by WayneWWW

Hi,

We have DMA driver in kernel source but not enabled by default. To enable this,

  1. Enable the same by adding below lines in defconfig(arch/arm64/configs/defconfig) file

CONFIG_PCIE_EPF_DMA_TEST=y
CONFIG_TEGRA_PCIE_DMA_TEST=y

• First one enables PCIe End-Point function EDMA test framework to validate on EP side
• Second one enables PCIe EP driver on Root-Port to validate on RP side.

  1. Enable Jetson Orin EP side (this part is same as official document)

cd /sys/kernel/config/pci_ep/
mkdir functions/tegra_pcie_dma_epf/func1
echo 0x10DE > functions/tegra_pcie_dma_epf/func1/vendorid
echo 0x229a > functions/tegra_pcie_dma_epf/func1/deviceid
echo 16 > functions/tegra_pcie_dma_epf/func1/msi_interrupts
ln -s functions/tegra_pcie_dma_epf/func1 controllers/141a0000.pcie_ep/
echo 1 > controllers/141a0000.pcie_ep/start

  1. Before performing EDMA test

Framework:

Once Driver are enabled and platform is booted. Each driver creates its own debugfs directory.

For EP directory is: /sys/kernel/debug/<contorller_addr>.pcie_ep_epf_dma_test/
example: PCIe C6 EP controller can be referred at /sys/kernel/debug/141c0000.pcie_ep_epf_dma_test/

For RP directory is: /sys/kernel/debug/:01:00.0_pcie_dma_test/
example: PCIe C5 RP controller can be referred at /sys/kernel/debug/0005:01:00.0_pcie_dma_test/

Configuration:

Configurable parameters are referenced via files.

edma_ch → used for configuring number of EDMA channels and their modes. Bit definition is
• [0:3] - To set mode of RD/WR channels. 0-Sync, 1-Async
• [4-7] - To enable RD/WR channels. 0-Disable, 1-Enable
• 31 - is used to enable Remote EDMA mode
• 30 - is used to trigger ABORT use-case validations
• so value of 0xF1 means. all channels enabled for WR mode with channel 0 in async and rest of the
channel in sync mode.
• Note: During testing, If an async channel is selected first and then sync channel, high chances
that bandwidth is shared between these channels.
• For effective bandwidth Calculations, ensure that all channels are enabled in same mode
only(Sync/Async).

nents → number of descriptors to be populated in each DMA submission(tegra_pcie_edma_submit_xfer API
call).
• When more than one DMA channel is enabled, these nents are splitted for those many channels.
example : if nents = 2 and edma_ch = 0x3, each DMA channel gets one nent each
• Note: nents*dma_size cannot cross 127MB

dma_size → used to specify size in bytes to be transferred in each SW transaction.

stress_count → used for indicating how many number of SW transaction needs to scheduled in one
execution.

  1. Start to test DMA performance

Configure below settings of 16Mb size and single channel in async mode for 1000 iterations of 4 nents.

echo 16777216 > dma_size
echo 4 > nents
echo 1000 > stress_count
echo 0x11 > edma_ch
cat edmalib_test

If the test is done correctly, dmesg will have the bandwidth information.

1 Like

Dear @WayneWWW

Thanks for quick support.
Can able to perform bandwidth test.
ā€˜ā€™ā€™
root@linux:/sys/kernel/debug/141a0000.pcie_ep_epf_dma_test# echo 0x11 > edma_ch
root@linux:/sys/kernel/debug/141a0000.pcie_ep_epf_dma_test#
root@linux:/sys/kernel/debug/141a0000.pcie_ep_epf_dma_test# cat edmalib_test
[21172.990376] pcie_dma_epf tegra_pcie_dma_epf.0: edma_ch changed from 0xff != 0x11, deinit
[21172.990545] pcie_dma_epf tegra_pcie_dma_epf.0: edmalib_common_test: re-init edma lib prev_ch(ff) != current chans(11)
[21172.990750] tegra194-pcie 141a0000.pcie_ep: tegra_pcie_edma_initialize: success
root@linux:/sys/kernel/debug/141a0000.pcie_ep_epf_dma_test# [21173.582823] pcie_dma_epf tegra_pcie_dma_epf.0: edmalib_common_test: EDMA LIB WR started for 1 chans, size 32777216 Bytes, iterations: 1000 of descriptors 4
[21173.583959] pcie_dma_epf tegra_pcie_dma_epf.0: edmalib_common_test: EDMA LIB submit done
[21191.646757] pcie_dma_epf tegra_pcie_dma_epf.0: edma_final_complete: WR-local-Async complete for chan 0 with status 0. Total desc 4000 of Sz 32777216 Bytes done in time 18064179202 nsec. Perf is 58063 Mbps
[21191.646763] pcie_dma_epf tegra_pcie_dma_epf.0: edma_final_complete: All Async channels. Cumulative Perf 58063 Mbps, time 18064179266 nsec

root@linux:/sys/kernel/debug/141a0000.pcie_ep_epf_dma_test#
root@linux:/sys/kernel/debug/141a0000.pcie_ep_epf_dma_test# echo 0x10 > edma_ch
root@linux:/sys/kernel/debug/141a0000.pcie_ep_epf_dma_test# cat edmalib_test
[21295.406432] pcie_dma_epf tegra_pcie_dma_epf.0: edma_ch changed from 0x11 != 0x10, deinit
[21295.406530] pcie_dma_epf tegra_pcie_dma_epf.0: edmalib_common_test: re-init edma lib prev_ch(11) != current chans(10)
[21295.406726] tegra194-pcie 141a0000.pcie_ep: tegra_pcie_edma_initialize: success
[21295.998941] pcie_dma_epf tegra_pcie_dma_epf.0: edmalib_common_test: EDMA LIB WR started for 1 chans, size 32777216 Bytes, iterations: 1000 of descriptors 4
root@linux:/sys/kernel/debug/141a0000.pcie_ep_epf_dma_test# [21314.084493] pcie_dma_epf tegra_pcie_dma_epf.0: edmalib_common_test: EDMA LIB WR-local-SYNC done for 1000 iter on channel 0. Total Size 1048870912000 bytes, time 18085798689 nsec. Perf is 57994 Mbps
[21314.084499] pcie_dma_epf tegra_pcie_dma_epf.0: edmalib_common_test: EDMA LIB submit done
ā€˜ā€™ā€™
May I know the reason for limitation of DMA size to 127MB.

Best Regards,
Saideepak.

Was this test done with 8 lanes of PCIe 4.0?

If so, it is roughly getting half the maximum bandwidth.

We have an implementation where we are using a 8 lane PCIe 3.0 to an FPGA, and are also getting significantly less then theoretical max.

Are there other bottlenecks in the orin? What is the speed of the dataplane backbone?

Hi,
Iam using 8 lanes with PCIE3.0 downgraded between Agx orin.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.