NX shared memory issue - any insights?

from our dev:

We are using large CMA (Contiguous Memory Allocator) buffer. The buffer is used for PCIe DMA transfers between FPGA and host CPU. The same driver is running at multiple platforms (x86, amd64, arm). It is running at Jetson NANO too. But it does not work at Jetson NX. So we have to find out the reason.
We are using 512MB to 2GB buffer size at AMD64 architecture (desktop computer). It is running correctly. It needs custom kernel compilation with CMA support.
And we are using 64MB buffer at Jetson NANO, it is running correctly too.
Any buffer size does not work at Jetson NX. CMA support is enabled in kernel (64MB). I recompiled kernel too. I changed the CMA buffer size to 256MB in new custom kernel for Jetson NX.

more from the dev

The issue is connected to memory region allocation.
The difference between simplified driver and full PCIe driver is following:
PCIe driver: ad->bufVA = dma_alloc_coherent( &dev->dev, DMA_BUFSIZE, &ad->bufBA, GFP_KERNEL);
simple driver without PCIe IO interface:  ad->bufVA = dma_alloc_coherent( NULL, DMA_BUFSIZE, &ad->bufBA, GFP_KERNEL);
The difference is in dev (dev - device to allocate coherent memory

I’m not sure what to make of the above two statements.
Are you saying that when ‘dev’ pointer is passed instead of NULL, there is no issue?
If yes, then, it is expected because we have SMMU enabled for PCIe and the ‘dev’ pointer needs to be passed to the dma_alloc_coherent() so that the IOVA comes from the PCIe’s IOVA pool.

@vidyas
response from our dev

Simplified driver (with NULL, without device):
    ad->bufVA = dma_alloc_coherent(
        NULL, DMA_BUFSIZE, &ad->bufBA, (GFP_KERNEL));
    if (ad->bufVA == NULL) {
        printk(KERN_ALERT "f2h_sdram: could not allocate buffer, try enable CMA\n");
        return PTR_ERR(ad->bufVA);
    }  
    printk(
        KERN_DEBUG "Allocated DMA buffer (virt: %p; bus: %p).\n",
        ad->bufVA, ad->bufBA
    );

Prints following to dmesg:

[ 160.809845] Requested size 134213632
[ 160.835331] Allocated DMA buffer (virt: ffffffc037200000; bus: 00000000b7200000).
It behaves like desktop computer, the buffer is allocated in CMA memory region. CMA region base address from dmesg:
[   0.000000] cma: Reserved 256 MiB at 0x00000000b6000000
[   0.000000] Memory: 6985116K/8122356K available (15294K kernel code, 2936K rwdata, 6736K rodata, 8576K init, 611K bss, 186968K reserved, 950272K cma-reserved)
[   0.967488] dma_declare_coherent_resizable_cma_memory:324: resizable heap=vpr, base=0x00000000c6000000, size=0x2a000000
[   0.967724] cma: enabled page replacement for spfn=c6000, epfn=f0000
[   0.967733] dma_declare_coherent_resizable_cma_memory:373: resizable cma heap=vpr create successful

from dev:



**Simplified driver**  (with NULL, without device):

ad->bufVA = dma_alloc_coherent(
NULL, DMA_BUFSIZE, &ad->bufBA, (GFP_KERNEL));
if (ad->bufVA == NULL) {
printk(KERN_ALERT "f2h_sdram: could not allocate buffer, try enable CMA\n");
return PTR_ERR(ad->bufVA);
}
printk(
KERN_DEBUG "Allocated DMA buffer (virt: %p; bus: %p).\n",
ad->bufVA, ad->bufBA
);

Prints following to dmesg:

[ 160.809845] Requested size 134213632
[ 160.835331] Allocated DMA buffer (virt: ffffffc037200000; bus: 00000000b7200000).

It behaves like desktop computer, the buffer is allocated in CMA memory region. CMA region base address from dmesg:

[   0.000000] cma: Reserved 256 MiB at 0x00000000b6000000
[   0.000000] Memory: 6985116K/8122356K available (15294K kernel code, 2936K rwdata, 6736K rodata, 8576K init, 611K bss, 186968K reserved, 950272K cma-reserved)
[   0.967488] dma_declare_coherent_resizable_cma_memory:324: resizable heap=vpr, base=0x00000000c6000000, size=0x2a000000
[   0.967724] cma: enabled page replacement for spfn=c6000, epfn=f0000
[   0.967733] dma_declare_coherent_resizable_cma_memory:373: resizable cma heap=vpr create successful

The full PCIe driver** (with dev)

ad->bufVA = dma_alloc_coherent(
&dev->dev, DMA_BUFSIZE, &ad->bufBA, (GFP_KERNEL));
if (ad->bufVA == NULL) {
printk(KERN_ALERT "f2h_sdram: could not allocate buffer, try enable CMA\n");
return PTR_ERR(ad->bufVA);
}
printk(
KERN_DEBUG "Allocated DMA buffer (virt: %p; bus: %p).\n",
ad->bufVA, ad->bufBA
);

Prints following to dmesg:

[ 362.066414] Allocated DMA buffer (virt: ffffff8028000000; bus: 00000004f8000000).

from the dev:
"It does not allocate buffer in CMA memory region.

I tried (GFP_KERNEL | GFP_DMA) with the same result. The base address was still 00000004f8000000.

What is the issue. The Jetson NX module restarts when I access the mmaped region from userspace.

So I am trying to read the data from some incorrect address. "

Let me see if I got it right this time
So, your observation is that when the NULL pointer is used, the allocation is coming from the CMA region whereas, with the ‘dev’ pointer, it doesn’t.
Your requirement is to get the allocation from the CMA region even with the ‘dev’ pointer, right?

@vidyas
Thank you for follwing up
from dev:
"yes, correct
The allocated buffer is in CMA region in case of NULL pointer
The allocation is from somewhere else in case, that I use dev. I don’t know what memory region is used in this case. I am able to read and write the region from kernel driver, but I am unable to access this region from userspace.
The access works in case, that I use NULL pointer. Then I am able to access it from userspace.

The devs discussed with other SW engineer, that wrote the driver, his opinion is, that NULL pointer in PCIe driver could be used too.

"
“I am using customized kernel now. I compiled kernel with larger CMA buffer. Default size was 64MB. I am using 256MB now. The driver allocates 128MB.”

Update from the dev:
“I am testing PCIe driver with NULL pointer now.
I am able to read the data from buffer. I wrote data in kernel driver and I was able to read data by SW application from userspace. So it seems, that the previous issue has been solved (the issue was, that I was not able to access the data in memory region allocated by kernel driver)
But there is one more issue. It is connected to SMMU. It seems, that FPGA is unable to transfer all data to this memory buffer. The dmesg log:”

[   63.804990] txlink flInit(), built at Nov 18 2020 22:21:47
[   63.805205] pcieProbe(dev = 0xffffffc1f4c61000, pciid = 0xffffff8000f79b38)
[   63.805211] pcieProbe() ape = 0xffffffc1c0ee0180
[   63.805244] txlink 0005:01:00.0: enabling device (0000 -> 0002)
[   63.805477] Using a 64-bit DMA mask.
[   63.805674] BAR mapped at barBA = 0x00000000, barVA = 0xffffff800dd0e000, barMinLen = 0x00000100, barLength = 0x02000000
[   63.805679] Request size 134213632
[   63.832182] Allocated DMA buffer (virt: ffffffc037200000; bus: 00000000b7200000).
[   63.832198] pcieProbe() successful.
[   87.219708] cdevOpen() dev=0xffffffc1c0ee0180
[   87.219736] cdevMmap() dev=0xffffffc1c0ee0180
[   87.220019] cdevMmap() dev=0xffffffc1c0ee0180
[   87.220029] cdevMMap dev->bufBA: 0x        b7200000,PAGE_SHIFT 0xc
[   87.226509] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb7200000, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.233430] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb72a87c0, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.240042] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb738ab40, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.247724] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb74b3340, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.254161] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb757c340, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.261506] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb766e800, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.267594] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb7762780, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.274153] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb784bc80, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.280791] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb7924a00, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.287623] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb7a1c980, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[   87.374571] mc-err: vpr base=0:c6000000, size=20, ctrl=3, override:(a01a8340, fcee10c1, 1, 0)
[   87.374760] mc-err: (255) csw_pcie5w: MC request violates VPR requirements
[   87.374922] mc-err:   status = 0x0ff740e3; addr = 0xffffffff00; hi_adr_reg=008
[   87.375060] mc-err:   secure: yes, access-type: write
[   87.375163] mc-err: mcerr: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000

The issue is this:

t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xb7200000, fsynr=0x280011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0





I tried to disable SMMU using following changes (found in NVIDIA forum):

--- a/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
+++ b/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
@@ -867,13 +867,6 @@
                pinctrl-0 = <&pex_rst_c5_out_state>;
                pinctrl-1 = <&clkreq_c5_bi_dir_state>;
-               iommus = <&smmu TEGRA_SID_PCIE5>;
-               dma-coherent;
-#if LINUX_VERSION >= 414
-               iommu-map = <0x0 &smmu TEGRA_SID_PCIE5 0x1000>;
-               iommu-map-mask = <0x0>;
-#endif
-
                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 53 0x04>;
--- a/drivers/iommu/arm-smmu-t19x.c
+++ b/drivers/iommu/arm-smmu-t19x.c
@@ -2535,7 +2535,10 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
        reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
        /* Enable fault reporting */
-       reg |= (sCR0_GFRE | sCR0_GFIE | sCR0_GCFGFRE | sCR0_GCFGFIE | sCR0_USFCFG);
+       reg |= (sCR0_GFRE | sCR0_GFIE | sCR0_GCFGFRE | sCR0_GCFGFIE);
+
+       /* Disable Unidentified stream fault reporting */
+       reg &= ~(sCR0_USFCFG);
        /* Disable TLB broadcasting. */
        reg |= (sCR0_VMIDPNE | sCR0_PTM);

Update from dev:
pcie smmu issue
I am going to try tegra194-memcfg-sw-override.cfg modification.”

I think your observations are correct in the sense that, using NULL pointer may reserve/use/allocate the memory from the generic pool (in this case CMA as it is enabled), but, since SMMU is enabled for PCIe, the PCIe endpoint may not be able to access this memory during read/writes.
For the PCIe endpoint to be able to access the host system’s memory, it should have been reserved/allocated using the dev pointer and not the NULL pointer.
Now, there are two ways to solve it.

  1. Disable the SMMU for PCIe so that its allocations come from the CMA pool. I think you are about to try this . Good luck and let us know your observations with it

  2. Get the allocations come from the CMA even with dev pointer when SMMU is enabled for PCIe controller. I don’t think this is supported as of today, but I can find out from our memory team and get back to you.