PCIE test probelem

Hi,
I’m using Jetson Orin Nx Developer Kit and a PC to test PCIE dma transfer with pci_endpoint_test.c provided by kernel.

Orin Nx board configed by this link
https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/Communications/PcieEndpointMode.html?highlight=pcie

Connecting and Configuring the Devices Section

On PC side modify and build the PCI_endpoint_test.c module and insert into kernel
then build pci_epf_test.c to make pcitest function
/dev/pci-endpoint-test.0 can be found
but when i try to run pcitest the task always blocked and the dmesg shows

 [  846.921118] INFO: task pcitest:3418 blocked for more than 120 seconds.
[  846.921135]       Tainted: G           OE     5.15.0-89-generic #99~20.04.1-Ubuntu
[  846.921141] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.921143] task:pcitest         state:D stack:    0 pid: 3418 ppid:  3311 flags:0x00000000
[  846.921155] Call Trace:
[  846.921159]  <TASK>
[  846.921165]  __schedule+0x2cd/0x890
[  846.921179]  ? usleep_range_state+0x90/0x90
[  846.921188]  schedule+0x69/0x110
[  846.921194]  schedule_timeout+0x206/0x2d0
[  846.921203]  ? putname+0x57/0x70
[  846.921209]  ? usleep_range_state+0x90/0x90
[  846.921216]  __wait_for_common+0xb0/0x160
[  846.921224]  wait_for_completion+0x24/0x30
[  846.921232]  pci_endpoint_test_ioctl+0x92f/0xcd7 [pci_endpoint_test]
[  846.921242]  __x64_sys_ioctl+0x92/0xd0
[  846.921249]  do_syscall_64+0x59/0xc0
[  846.921256]  entry_SYSCALL_64_after_hwframe+0x62/0xcc
[  846.921262] RIP: 0033:0x7f28999ac3ab
[  846.921267] RSP: 002b:00007ffd229622d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  846.921274] RAX: ffffffffffffffda RBX: 000056398a36e004 RCX: 00007f28999ac3ab
[  846.921277] RDX: 00007ffd229622e0 RSI: 0000000040085005 RDI: 0000000000000003
[  846.921281] RBP: 000056398ac872a0 R08: 0000000000000000 R09: 0000000000000000
[  846.921284] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffea
[  846.921287] R13: 0000000000000003 R14: 00007ffd229622e0 R15: 0000000000000001
[  846.921292]  </TASK>

the lspci shows

0000:05:00.0 RAM memory: NVIDIA Corporation Device 0001
	Flags: bus master, fast devsel, latency 0, IRQ 168
	Memory at 85e00000 (32-bit, non-prefetchable) [size=64K]
	Memory at 6001e00000 (64-bit, prefetchable) [size=128K]
	Memory at 85e10000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [148] Secondary PCI Express
	Capabilities: [168] Physical Layer 16.0 GT/s <?>
	Capabilities: [18c] Lane Margining at the Receiver <?>
	Capabilities: [1a4] Latency Tolerance Reporting
	Capabilities: [1ac] L1 PM Substates
	Capabilities: [1bc] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
	Capabilities: [2bc] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
	Capabilities: [2f4] Data Link Feature <?>
	Capabilities: [300] Precision Time Measurement
	Capabilities: [30c] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
	Capabilities: [374] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
	Kernel driver in use: pci-endpoint-test
	Kernel modules: pci_endpoint_test

need some help

update
think it may caused by IRQ problem
tired pci_epf_test instead of pci_epf_nv_test ob board
add

echo 16 > functions/pci_epf_test/func1/msi_interrupts

echo 8 > functions/pci_epf_test/func1/msix_interrupts

before Binding controllor
then
the lspci shows

0000:05:00.0 Unassigned class [ff00]: NVIDIA Corporation Device 0001
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at 85e00000 (64-bit, non-prefetchable) [size=1M]
	Memory at 6001e00000 (64-bit, prefetchable) [size=128K]
	Memory at 85f00000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [148] Secondary PCI Express
	Capabilities: [168] Physical Layer 16.0 GT/s <?>
	Capabilities: [18c] Lane Margining at the Receiver <?>
	Capabilities: [1a4] Latency Tolerance Reporting
	Capabilities: [1ac] L1 PM Substates
	Capabilities: [1bc] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
	Capabilities: [2bc] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
	Capabilities: [2f4] Data Link Feature <?>
	Capabilities: [300] Precision Time Measurement
	Capabilities: [30c] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
	Capabilities: [374] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
	Kernel driver in use: pci-endpoint-test
	Kernel modules: pci_endpoint_test

when i tried to run pcitest.sh
still failed on msi and msix

> BAR tests
> 
> BAR0:		OKAY
> BAR1:		NOT OKAY
> BAR2:		NOT OKAY
> BAR3:		NOT OKAY
> BAR4:		NOT OKAY
> BAR5:		NOT OKAY
> 
> Interrupt tests
> 
> SET IRQ TYPE TO LEGACY:		OKAY
> LEGACY IRQ:	OKAY
> SET IRQ TYPE TO MSI:		OKAY
> MSI1:		OKAY
> MSI2:		NOT OKAY
> MSI3:		NOT OKAY
> MSI4:		NOT OKAY
> MSI5:		NOT OKAY
> MSI6:		NOT OKAY
> MSI7:		NOT OKAY
> MSI8:		NOT OKAY
> MSI9:		NOT OKAY
> MSI10:		NOT OKAY
> MSI11:		NOT OKAY
> MSI12:		NOT OKAY
> MSI13:		NOT OKAY
> MSI14:		NOT OKAY
> MSI15:		NOT OKAY
> MSI16:		NOT OKAY
> MSI17:		NOT OKAY
> MSI18:		NOT OKAY
> MSI19:		NOT OKAY
> MSI20:		NOT OKAY
> MSI21:		NOT OKAY
> MSI22:		NOT OKAY
> MSI23:		NOT OKAY
> MSI24:		NOT OKAY
> MSI25:		NOT OKAY
> MSI26:		NOT OKAY
> MSI27:		NOT OKAY
> MSI28:		NOT OKAY
> MSI29:		NOT OKAY
> MSI30:		NOT OKAY
> MSI31:		NOT OKAY
> MSI32:		NOT OKAY
> 
> SET IRQ TYPE TO MSI-X:		OKAY
> MSI-X1:		NOT OKAY
> MSI-X2:		NOT OKAY
> MSI-X3:		NOT OKAY
> MSI-X4:		NOT OKAY
> MSI-X5:		NOT OKAY
> MSI-X6:		NOT OKAY
> MSI-X7:		NOT OKAY
> MSI-X8:		NOT OKAY
> MSI-X9:		NOT OKAY

Hi,

Just a clarification, are you testing this between a Orin and another x86 host PC?

yes, the PC with Ubuntu20.04 kernel 5.15.0

It is possible to test with Orin AGX? We didn’t have a PC validated of this function.

sry, we dont have AGX kit

Update: pcitest run successful on RC
but get new problem
the endpoint side use pci-efp-test driver and RC side use pci_endpoint_test driver to test
the dmesg on endpoint shows below

               WRITE => Size: 102400 bytes       DMA: YES        Time: 0.040605253 seconds      Rate: 2462 KB/s
[62485.367378] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x2140020000, fsynr=0x190011, cbfrsynra=0x404, cb=0
[62485.379905] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0xffee0000, fsynr=0x4a0003, cbfrsynra=0x404, cb=0
[62485.392351] mc-err: Too many MC errors; throttling prints
[62485.397981] 
               WRITE => Size: 102400 bytes       DMA: YES        Time: 0.030608343 seconds      Rate: 3267 KB/s
[62559.361767] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x2140020000, fsynr=0x190011, cbfrsynra=0x404, cb=0
[62559.374296] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0xffee0000, fsynr=0x4a0003, cbfrsynra=0x404, cb=0
[62559.386747] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[62559.402389] 
               WRITE => Size: 102400 bytes       DMA: YES        Time: 0.040632888 seconds      Rate: 2461 KB/s
[64981.928914] tegra194-pcie 14160000.pcie_ep: LTSSM state: 0xd8 timeout: -110
[65103.166103] tegra194-pcie 14160000.pcie_ep: LTSSM state: 0xc8 timeout: -110
[65444.326018] 
               WRITE => Size: 102400 bytes       DMA: NO         Time: 0.000830611 seconds      Rate: 120393 KB/s
[65490.296965] 
               WRITE => Size: 102400 bytes       DMA: NO         Time: 0.000830739 seconds      Rate: 120374 KB/s
~~~~~~~~~~~~~~~~~~~~~

the DMA fail to transfer and transfer speed seems too low

i tried use terga_pci_dma_test driver on endpoint device but the terga-pcie-ep-mem and terga-pci-dma-test driver fail to run on PC(RC) side with pci_endpoint_test driver didn’t work.

the question is what should do for improve pcie performance and how should i test it?
should i rewrite pci_endpoint_test driver refer to terga-pcie-ep-mem.c to test dma speed?

anyone can help?

anyone?

Even we have a similar scenario.
Where we need to communicate with another x86 SBC from Orin NX over PCIE lanes.

In the below thread, they are telling we need to write our own custom drivers on the x86 PC side to make PCIe communication work.

Let me know, if you have solution for this and kindly share with us.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.