PCIE test probelem

Hi,
I’m using Jetson Orin Nx Developer Kit and a PC to test PCIE dma transfer with pci_endpoint_test.c provided by kernel.

Orin Nx board configed by this link
https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/Communications/PcieEndpointMode.html?highlight=pcie

Connecting and Configuring the Devices Section

On PC side modify and build the PCI_endpoint_test.c module and insert into kernel
then build pci_epf_test.c to make pcitest function
/dev/pci-endpoint-test.0 can be found
but when i try to run pcitest the task always blocked and the dmesg shows

 [  846.921118] INFO: task pcitest:3418 blocked for more than 120 seconds.
[  846.921135]       Tainted: G           OE     5.15.0-89-generic #99~20.04.1-Ubuntu
[  846.921141] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.921143] task:pcitest         state:D stack:    0 pid: 3418 ppid:  3311 flags:0x00000000
[  846.921155] Call Trace:
[  846.921159]  <TASK>
[  846.921165]  __schedule+0x2cd/0x890
[  846.921179]  ? usleep_range_state+0x90/0x90
[  846.921188]  schedule+0x69/0x110
[  846.921194]  schedule_timeout+0x206/0x2d0
[  846.921203]  ? putname+0x57/0x70
[  846.921209]  ? usleep_range_state+0x90/0x90
[  846.921216]  __wait_for_common+0xb0/0x160
[  846.921224]  wait_for_completion+0x24/0x30
[  846.921232]  pci_endpoint_test_ioctl+0x92f/0xcd7 [pci_endpoint_test]
[  846.921242]  __x64_sys_ioctl+0x92/0xd0
[  846.921249]  do_syscall_64+0x59/0xc0
[  846.921256]  entry_SYSCALL_64_after_hwframe+0x62/0xcc
[  846.921262] RIP: 0033:0x7f28999ac3ab
[  846.921267] RSP: 002b:00007ffd229622d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  846.921274] RAX: ffffffffffffffda RBX: 000056398a36e004 RCX: 00007f28999ac3ab
[  846.921277] RDX: 00007ffd229622e0 RSI: 0000000040085005 RDI: 0000000000000003
[  846.921281] RBP: 000056398ac872a0 R08: 0000000000000000 R09: 0000000000000000
[  846.921284] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffea
[  846.921287] R13: 0000000000000003 R14: 00007ffd229622e0 R15: 0000000000000001
[  846.921292]  </TASK>

the lspci shows

0000:05:00.0 RAM memory: NVIDIA Corporation Device 0001
	Flags: bus master, fast devsel, latency 0, IRQ 168
	Memory at 85e00000 (32-bit, non-prefetchable) [size=64K]
	Memory at 6001e00000 (64-bit, prefetchable) [size=128K]
	Memory at 85e10000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [148] Secondary PCI Express
	Capabilities: [168] Physical Layer 16.0 GT/s <?>
	Capabilities: [18c] Lane Margining at the Receiver <?>
	Capabilities: [1a4] Latency Tolerance Reporting
	Capabilities: [1ac] L1 PM Substates
	Capabilities: [1bc] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
	Capabilities: [2bc] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
	Capabilities: [2f4] Data Link Feature <?>
	Capabilities: [300] Precision Time Measurement
	Capabilities: [30c] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
	Capabilities: [374] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
	Kernel driver in use: pci-endpoint-test
	Kernel modules: pci_endpoint_test

need some help

update
think it may caused by IRQ problem
tired pci_epf_test instead of pci_epf_nv_test ob board
add

echo 16 > functions/pci_epf_test/func1/msi_interrupts

echo 8 > functions/pci_epf_test/func1/msix_interrupts

before Binding controllor
then
the lspci shows

0000:05:00.0 Unassigned class [ff00]: NVIDIA Corporation Device 0001
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at 85e00000 (64-bit, non-prefetchable) [size=1M]
	Memory at 6001e00000 (64-bit, prefetchable) [size=128K]
	Memory at 85f00000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [148] Secondary PCI Express
	Capabilities: [168] Physical Layer 16.0 GT/s <?>
	Capabilities: [18c] Lane Margining at the Receiver <?>
	Capabilities: [1a4] Latency Tolerance Reporting
	Capabilities: [1ac] L1 PM Substates
	Capabilities: [1bc] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
	Capabilities: [2bc] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
	Capabilities: [2f4] Data Link Feature <?>
	Capabilities: [300] Precision Time Measurement
	Capabilities: [30c] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
	Capabilities: [374] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
	Kernel driver in use: pci-endpoint-test
	Kernel modules: pci_endpoint_test

when i tried to run pcitest.sh
still failed on msi and msix

> BAR tests
> 
> BAR0:		OKAY
> BAR1:		NOT OKAY
> BAR2:		NOT OKAY
> BAR3:		NOT OKAY
> BAR4:		NOT OKAY
> BAR5:		NOT OKAY
> 
> Interrupt tests
> 
> SET IRQ TYPE TO LEGACY:		OKAY
> LEGACY IRQ:	OKAY
> SET IRQ TYPE TO MSI:		OKAY
> MSI1:		OKAY
> MSI2:		NOT OKAY
> MSI3:		NOT OKAY
> MSI4:		NOT OKAY
> MSI5:		NOT OKAY
> MSI6:		NOT OKAY
> MSI7:		NOT OKAY
> MSI8:		NOT OKAY
> MSI9:		NOT OKAY
> MSI10:		NOT OKAY
> MSI11:		NOT OKAY
> MSI12:		NOT OKAY
> MSI13:		NOT OKAY
> MSI14:		NOT OKAY
> MSI15:		NOT OKAY
> MSI16:		NOT OKAY
> MSI17:		NOT OKAY
> MSI18:		NOT OKAY
> MSI19:		NOT OKAY
> MSI20:		NOT OKAY
> MSI21:		NOT OKAY
> MSI22:		NOT OKAY
> MSI23:		NOT OKAY
> MSI24:		NOT OKAY
> MSI25:		NOT OKAY
> MSI26:		NOT OKAY
> MSI27:		NOT OKAY
> MSI28:		NOT OKAY
> MSI29:		NOT OKAY
> MSI30:		NOT OKAY
> MSI31:		NOT OKAY
> MSI32:		NOT OKAY
> 
> SET IRQ TYPE TO MSI-X:		OKAY
> MSI-X1:		NOT OKAY
> MSI-X2:		NOT OKAY
> MSI-X3:		NOT OKAY
> MSI-X4:		NOT OKAY
> MSI-X5:		NOT OKAY
> MSI-X6:		NOT OKAY
> MSI-X7:		NOT OKAY
> MSI-X8:		NOT OKAY
> MSI-X9:		NOT OKAY

Hi,

Just a clarification, are you testing this between a Orin and another x86 host PC?

yes, the PC with Ubuntu20.04 kernel 5.15.0

It is possible to test with Orin AGX? We didn’t have a PC validated of this function.

sry, we dont have AGX kit

Update: pcitest run successful on RC
but get new problem
the endpoint side use pci-efp-test driver and RC side use pci_endpoint_test driver to test
the dmesg on endpoint shows below

               WRITE => Size: 102400 bytes       DMA: YES        Time: 0.040605253 seconds      Rate: 2462 KB/s
[62485.367378] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x2140020000, fsynr=0x190011, cbfrsynra=0x404, cb=0
[62485.379905] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0xffee0000, fsynr=0x4a0003, cbfrsynra=0x404, cb=0
[62485.392351] mc-err: Too many MC errors; throttling prints
[62485.397981] 
               WRITE => Size: 102400 bytes       DMA: YES        Time: 0.030608343 seconds      Rate: 3267 KB/s
[62559.361767] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x2140020000, fsynr=0x190011, cbfrsynra=0x404, cb=0
[62559.374296] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0xffee0000, fsynr=0x4a0003, cbfrsynra=0x404, cb=0
[62559.386747] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[62559.402389] 
               WRITE => Size: 102400 bytes       DMA: YES        Time: 0.040632888 seconds      Rate: 2461 KB/s
[64981.928914] tegra194-pcie 14160000.pcie_ep: LTSSM state: 0xd8 timeout: -110
[65103.166103] tegra194-pcie 14160000.pcie_ep: LTSSM state: 0xc8 timeout: -110
[65444.326018] 
               WRITE => Size: 102400 bytes       DMA: NO         Time: 0.000830611 seconds      Rate: 120393 KB/s
[65490.296965] 
               WRITE => Size: 102400 bytes       DMA: NO         Time: 0.000830739 seconds      Rate: 120374 KB/s
~~~~~~~~~~~~~~~~~~~~~

the DMA fail to transfer and transfer speed seems too low

i tried use terga_pci_dma_test driver on endpoint device but the terga-pcie-ep-mem and terga-pci-dma-test driver fail to run on PC(RC) side with pci_endpoint_test driver didn’t work.

the question is what should do for improve pcie performance and how should i test it?
should i rewrite pci_endpoint_test driver refer to terga-pcie-ep-mem.c to test dma speed?

anyone can help?

anyone?