Hi,
I’m using Jetson Orin Nx Developer Kit and a PC to test PCIE dma transfer with pci_endpoint_test.c provided by kernel.
Orin Nx board configed by this link
https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/Communications/PcieEndpointMode.html?highlight=pcie
Connecting and Configuring the Devices Section
On PC side modify and build the PCI_endpoint_test.c module and insert into kernel
then build pci_epf_test.c to make pcitest function
/dev/pci-endpoint-test.0 can be found
but when i try to run pcitest the task always blocked and the dmesg shows
[ 846.921118] INFO: task pcitest:3418 blocked for more than 120 seconds.
[ 846.921135] Tainted: G OE 5.15.0-89-generic #99~20.04.1-Ubuntu
[ 846.921141] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 846.921143] task:pcitest state:D stack: 0 pid: 3418 ppid: 3311 flags:0x00000000
[ 846.921155] Call Trace:
[ 846.921159] <TASK>
[ 846.921165] __schedule+0x2cd/0x890
[ 846.921179] ? usleep_range_state+0x90/0x90
[ 846.921188] schedule+0x69/0x110
[ 846.921194] schedule_timeout+0x206/0x2d0
[ 846.921203] ? putname+0x57/0x70
[ 846.921209] ? usleep_range_state+0x90/0x90
[ 846.921216] __wait_for_common+0xb0/0x160
[ 846.921224] wait_for_completion+0x24/0x30
[ 846.921232] pci_endpoint_test_ioctl+0x92f/0xcd7 [pci_endpoint_test]
[ 846.921242] __x64_sys_ioctl+0x92/0xd0
[ 846.921249] do_syscall_64+0x59/0xc0
[ 846.921256] entry_SYSCALL_64_after_hwframe+0x62/0xcc
[ 846.921262] RIP: 0033:0x7f28999ac3ab
[ 846.921267] RSP: 002b:00007ffd229622d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 846.921274] RAX: ffffffffffffffda RBX: 000056398a36e004 RCX: 00007f28999ac3ab
[ 846.921277] RDX: 00007ffd229622e0 RSI: 0000000040085005 RDI: 0000000000000003
[ 846.921281] RBP: 000056398ac872a0 R08: 0000000000000000 R09: 0000000000000000
[ 846.921284] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffea
[ 846.921287] R13: 0000000000000003 R14: 00007ffd229622e0 R15: 0000000000000001
[ 846.921292] </TASK>
the lspci shows
0000:05:00.0 RAM memory: NVIDIA Corporation Device 0001
Flags: bus master, fast devsel, latency 0, IRQ 168
Memory at 85e00000 (32-bit, non-prefetchable) [size=64K]
Memory at 6001e00000 (64-bit, prefetchable) [size=128K]
Memory at 85e10000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Secondary PCI Express
Capabilities: [168] Physical Layer 16.0 GT/s <?>
Capabilities: [18c] Lane Margining at the Receiver <?>
Capabilities: [1a4] Latency Tolerance Reporting
Capabilities: [1ac] L1 PM Substates
Capabilities: [1bc] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2bc] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2f4] Data Link Feature <?>
Capabilities: [300] Precision Time Measurement
Capabilities: [30c] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
Capabilities: [374] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
Kernel driver in use: pci-endpoint-test
Kernel modules: pci_endpoint_test
need some help
update
think it may caused by IRQ problem
tired pci_epf_test instead of pci_epf_nv_test ob board
add
echo 16 > functions/pci_epf_test/func1/msi_interrupts
echo 8 > functions/pci_epf_test/func1/msix_interrupts
before Binding controllor
then
the lspci shows
0000:05:00.0 Unassigned class [ff00]: NVIDIA Corporation Device 0001
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at 85e00000 (64-bit, non-prefetchable) [size=1M]
Memory at 6001e00000 (64-bit, prefetchable) [size=128K]
Memory at 85f00000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Secondary PCI Express
Capabilities: [168] Physical Layer 16.0 GT/s <?>
Capabilities: [18c] Lane Margining at the Receiver <?>
Capabilities: [1a4] Latency Tolerance Reporting
Capabilities: [1ac] L1 PM Substates
Capabilities: [1bc] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2bc] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2f4] Data Link Feature <?>
Capabilities: [300] Precision Time Measurement
Capabilities: [30c] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
Capabilities: [374] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
Kernel driver in use: pci-endpoint-test
Kernel modules: pci_endpoint_test
when i tried to run pcitest.sh
still failed on msi and msix
> BAR tests
>
> BAR0: OKAY
> BAR1: NOT OKAY
> BAR2: NOT OKAY
> BAR3: NOT OKAY
> BAR4: NOT OKAY
> BAR5: NOT OKAY
>
> Interrupt tests
>
> SET IRQ TYPE TO LEGACY: OKAY
> LEGACY IRQ: OKAY
> SET IRQ TYPE TO MSI: OKAY
> MSI1: OKAY
> MSI2: NOT OKAY
> MSI3: NOT OKAY
> MSI4: NOT OKAY
> MSI5: NOT OKAY
> MSI6: NOT OKAY
> MSI7: NOT OKAY
> MSI8: NOT OKAY
> MSI9: NOT OKAY
> MSI10: NOT OKAY
> MSI11: NOT OKAY
> MSI12: NOT OKAY
> MSI13: NOT OKAY
> MSI14: NOT OKAY
> MSI15: NOT OKAY
> MSI16: NOT OKAY
> MSI17: NOT OKAY
> MSI18: NOT OKAY
> MSI19: NOT OKAY
> MSI20: NOT OKAY
> MSI21: NOT OKAY
> MSI22: NOT OKAY
> MSI23: NOT OKAY
> MSI24: NOT OKAY
> MSI25: NOT OKAY
> MSI26: NOT OKAY
> MSI27: NOT OKAY
> MSI28: NOT OKAY
> MSI29: NOT OKAY
> MSI30: NOT OKAY
> MSI31: NOT OKAY
> MSI32: NOT OKAY
>
> SET IRQ TYPE TO MSI-X: OKAY
> MSI-X1: NOT OKAY
> MSI-X2: NOT OKAY
> MSI-X3: NOT OKAY
> MSI-X4: NOT OKAY
> MSI-X5: NOT OKAY
> MSI-X6: NOT OKAY
> MSI-X7: NOT OKAY
> MSI-X8: NOT OKAY
> MSI-X9: NOT OKAY
Hi,
Just a clarification, are you testing this between a Orin and another x86 host PC?
yes, the PC with Ubuntu20.04 kernel 5.15.0
It is possible to test with Orin AGX? We didn’t have a PC validated of this function.
sry, we dont have AGX kit
Update: pcitest run successful on RC
but get new problem
the endpoint side use pci-efp-test driver and RC side use pci_endpoint_test driver to test
the dmesg on endpoint shows below
WRITE => Size: 102400 bytes DMA: YES Time: 0.040605253 seconds Rate: 2462 KB/s
[62485.367378] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x2140020000, fsynr=0x190011, cbfrsynra=0x404, cb=0
[62485.379905] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0xffee0000, fsynr=0x4a0003, cbfrsynra=0x404, cb=0
[62485.392351] mc-err: Too many MC errors; throttling prints
[62485.397981]
WRITE => Size: 102400 bytes DMA: YES Time: 0.030608343 seconds Rate: 3267 KB/s
[62559.361767] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x2140020000, fsynr=0x190011, cbfrsynra=0x404, cb=0
[62559.374296] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0xffee0000, fsynr=0x4a0003, cbfrsynra=0x404, cb=0
[62559.386747] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[62559.402389]
WRITE => Size: 102400 bytes DMA: YES Time: 0.040632888 seconds Rate: 2461 KB/s
[64981.928914] tegra194-pcie 14160000.pcie_ep: LTSSM state: 0xd8 timeout: -110
[65103.166103] tegra194-pcie 14160000.pcie_ep: LTSSM state: 0xc8 timeout: -110
[65444.326018]
WRITE => Size: 102400 bytes DMA: NO Time: 0.000830611 seconds Rate: 120393 KB/s
[65490.296965]
WRITE => Size: 102400 bytes DMA: NO Time: 0.000830739 seconds Rate: 120374 KB/s
~~~~~~~~~~~~~~~~~~~~~
the DMA fail to transfer and transfer speed seems too low
i tried use terga_pci_dma_test driver on endpoint device but the terga-pcie-ep-mem and terga-pci-dma-test driver fail to run on PC(RC) side with pci_endpoint_test driver didn’t work.
the question is what should do for improve pcie performance and how should i test it?
should i rewrite pci_endpoint_test driver refer to terga-pcie-ep-mem.c to test dma speed?