Unable to bringup PCIe M.2 co-processor

We have Nvidia Jetson Xavier NX dev kit with 5.10.104-tegra, ubuntu 20.04.4 LTS,
while trying to access a PCIE m.2 device by MMAP of bar files, I am getting all 0xffffffff and getting following logs on dmesg,
what is the issue?
how to resolve it?
---------------------dmesg----------------------
[Nov30 11:41] nvidia_smmu_context_fault_bank: 1 callbacks suppressed
[ +0.000023] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x4000c000, fsynr=0x20011, cbfrsynra=0x145b, cb=7
[ +0.000486] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x4000c000, fsynr=0x80001, cbfrsynra=0xc5b, cb=7
[ +0.000379] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x40041000, fsynr=0x80001, cbfrsynra=0x5b, cb=7
[ +0.000455] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x40041000, fsynr=0x80001, cbfrsynra=0x5b, cb=7
[ +0.000361] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000400, iova=0x40041000, fsynr=0x80001, cbfrsynra=0x5b, cb=7
[ +0.000359] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x40041000, fsynr=0x80001, cbfrsynra=0x5b, cb=7
[ +0.000363] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000400, iova=0x40041000, fsynr=0x80001, cbfrsynra=0x5b, cb=7
[ +0.000499] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x40041000, fsynr=0x80001, cbfrsynra=0x5b, cb=7
[ +0.004908] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x40041000, fsynr=0x20011, cbfrsynra=0x145b, cb=7
[ +0.012149] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x80000400, iova=0x40041000, fsynr=0x80001, cbfrsynra=0x5b, cb=7
[ +0.015250] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[ +0.012116] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000

What kind of pcie device is in use here? Are you sure the pcie device is in a working state before you run mmap command?

Looks like an invalid address access. This arm-smmu 12000000.iommu: Unhandled context fault: iova=xxxx like error thrown when the endpoint wants to access (read or write) an un-allocated/un-pinned memory region

Hi Wayne, Thank you for your quick response.
This PCIe device is functional and has been tested on multiple x86_64 platforms from various manufacturers. This is the first time I am trying to up on aarch64 Linux platform.
We are programming 3 bar registers:

cat /sys/devices/platform/141a0000.pcie/pci0005:00/0005:00:00.0/0005:01:00.0/resource

0x0000001f40080000 0x0000001f40083fff 0x0000000000140204
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000001f40000000 0x0000001f4007ffff 0x0000000000140204
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000001c00000000 0x0000001c000fffff 0x000000000014220c
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000

Can you please give me some hints to debug further?

Thanks,
Rishabh

Hi,

What is this device exactly? Also, what is your test method here? Any code to share? Is that same as what you run on x86_64?

Have you compared the lspci -vv result between aarch64 and x86_64 of this device?

Hi,
This device is a FPGA card.
We are trying to access its registers.
Yes its same code that works on x86_64 platforms.
I have attached lspci -vvv output for Nvidia NX and Intel NUC platform, please find attachment.

intel_nuc_lspci.txt (3.5 KB)
nv_nx_lspci.txt (3.5 KB)

Thanks,
Rishabh Dani

Hi @WayneWWW

Can you please give me some hints to debug further?

So will you share the code you are using?

Hi @WayneWWW ,
we are using this GitHub - billfarrow/pcimem: Simple program to read & write to a pci device from userspace
to read registers,
and we are using this resource file:
/sys/devices/platform/141a0000.pcie/pci0005:00/0005:00:00.0/0005:01:00.0/resource2

Hi @WayneWWW , provided the sample code that works on x86 platform but not on Nvidia Xavier NX, Please give some hints to debug further.

Thanks,
Rishabh

Does this smmu error happen only when open resource2 node or even resource 0 and resource 1?

Hi @WayneWWW,
It happens on all resource files.
We tried with kernel driver as well, getting same SMMU error.
Please give some hints to debug further.
Thanks,
Rishabh

Hi @WayneWWW,
Please give some hints to debug further.
Thanks,
Rishabh