PCIe-AXI DMA error after migration from r23.2 to r24.2

Hi Vidyas,

I’m addressing directly to you because I’ve seen a similar error discussion at https://devtalk.nvidia.com/default/topic/963078/?comment=4976394
where you gave an advice…

We had a family of the device drivers for the FPGA modules behind the pcie-axi bridge.
On the 23.2 it was Ok.
On 24.2 SMMU becomes active and denys an access of our VDMA to a buffer.

After start of VDMA data transfer we can see the errors shown below.
We cannot see a real data in the buffer. It seems SMMU prevents to put data via PCIe bridge to chosen memory block.

[  186.529950] smmu_dump_pagetable(): fault_address=0x00000000c0000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  186.541416] mc-err: (0) csw_afiw: EMEM decode error on PDE or PTE entry
[  186.548315] mc-err:   status = 0x60010031; addr = 0xc0000000
[  186.554102] mc-err:   secure: no, access-type: write, SMMU fault: nr-nw-s
[  186.560975] smmu_dump_pagetable(): fault_address=0x00000000c0c7a800 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  186.572330] mc-err: (0) csw_afiw: EMEM decode error on PDE or PTE entry
[  186.579101] mc-err:   status = 0x60010031; addr = 0xc0c7a800
[  186.584789] mc-err:   secure: no, access-type: write, SMMU fault: nr-nw-s
[  186.592565] smmu_dump_pagetable(): fault_address=0x00000000c17f7000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  186.603956] mc-err: (0) csw_afiw: EMEM decode error on PDE or PTE entry
[  186.610649] mc-err:   status = 0x60010031; addr = 0xc17f7000
[  186.616347] mc-err:   secure: no, access-type: write, SMMU fault: nr-nw-s
[  186.623194] smmu_dump_pagetable(): fault_address=0x00000000c2432700 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  186.634509] mc-err: (0) csw_afiw: EMEM decode error on PDE or PTE entry
[  186.641234] mc-err:   status = 0x60054031; addr = 0xc2432700
[  186.646960] mc-err:   secure: no, access-type: write, SMMU fault: nr-nw-s
[  186.655577] mc-err: Too many MC errors; throttling prints

Memory is reserved in DT:

reserved-memory {
                ranges;
                vdmabuffer@C0000000 {
                        reg = <0x0 0xc0000000 0x0 0x10000000>;
                };
        };
                
        pcie-controller {
                pci@1,0 {
                        ranges= <0xc2000000 0x0 0x20000000 0xc2000000 0x0 0x20000000 0x0 0x20000000>;
                        pci@3,0 {
                                reg = <0x10000 0x0 0x0 0x0 0x0>;
                                compatible = "artcg,ag-parent-alpha";
                                ranges = <0x0 0xc2000000 0x0 0x20000000 0x100000>;

                                axi_vdma_0@00000000 {
                                        #address-cells = <0x1>;
                                        #size-cells = <0x1>;
                                        compatible = "artcg,ag-vdma-alpha";
                                        reg = <0x00000 0x1000 0x20000 0x1000>;
                                        reg_names = "addr", "numreg_addr";
                                        ranges;
                                        artcg,memory-pool = <0xc0000000>;
                                        artcg,num-frames  = <0x8>;
                                        artcg,bpp    = <0x2> ;

Please advise.

Thank you,
Alex

Tracing the kernel code I’ve found that there is only one NVIDIA driver uses reserved memory for the buffer. It is carveouts.
The memory mapping is written in
drivers/video/tegra/nvmap/nvmap_init.c

I’ve tied to repeat an idea to map a reserved memory, but failed…
It seems, a modern method using of_reserved_mem_device_init(&pdev->dev) is not working because I can see the messages in dmesg regarding iram reserved memory indicates a legacy allocation, but there is no error messages form modern method… They are working both???

[    0.000000] Reserved memory: initialized node iram-carveout, compatible id nvidia,iram-carveout
[    1.321031] tegra-carveouts tegra-carveouts.23: iram :dma coherent mem declare 0x0000000040001000,258048
[    1.330351] tegra-carveouts tegra-carveouts.23: assigned reserved memory node iram-carveout
[    1.394925] misc nvmap: created heap iram base 0x0000000040001000 size (252KiB)

My call of the of_reserved_mem_device_init(&pdev->dev) fails…

Memory is declared as:

reserved-memory {
                #address-cells = <0x2>;
                #size-cells = <0x2>;
                ranges;

                vdmabuff: vdmabuffer@C0000000 {
                        reg = <0x0 0xc0000000 0x0 0x10000000>;
                };
        };

Device is declared as:

...
                pci@1,0 {
                        ranges= <0xc2000000 0x0 0x20000000 0xc2000000 0x0 0x20000000 0x0 0x20000000>;
                        pci@3,0 {
                                reg = <0x10000 0x0 0x0 0x0 0x0>;
                                compatible = "artcg,ag-parent-alpha";
                                ranges = <0x0 0xc2000000 0x0 0x20000000 0x100000>;
                                #address-cells = <0x1>;
                                #size-cells = <0x1>;
                                memory-region = <&vdmabuff>;
...

Please advise.

I’ve found that of_reserved_mem_device_init(&pdev->dev) finally calls dma_declare_coherent_memory,
but I need non-coherent buffer for stream dma allocation later. :(((

Any advice?

Is there easy possibility to give the pcie bus a reserved memory mapped with iommu?

Now I have an ugly hack to make works my DMA even under SMMU of r24.2

I was learning more a startup memory initialization for NVIDIA drivers and did the same… All these code cannot be accepted for Release :(((

Perhaps, the next patch might be helpful for guys like me, who have never got an answer and had to reverse engineering the drivers’ code…

I’m still hoping to hear some better ideas…

diff --git a/drivers/platform/tegra/iommu.c b/drivers/platform/tegra/iommu.c
index f1eeb1c..0f6fe39 100644
--- a/drivers/platform/tegra/iommu.c
+++ b/drivers/platform/tegra/iommu.c
@@ -31,7 +31,10 @@ phys_addr_t __weak tegra_fb_start, tegra_fb_size,
        tegra_bootloader_fb_start, tegra_bootloader_fb_size,
        tegra_bootloader_fb2_start, tegra_bootloader_fb2_size;
 
+phys_addr_t artec_buff_start = 0xc0000000, artec_buff_size = 0x10000000;
+
 static struct iommu_linear_map tegra_fb_linear_map[16]; /* Terminated with 0 */
+static struct iommu_linear_map artec_linear_map[16]; /* Terminated with 0 */
 
 #define LINEAR_MAP_ADD(n) \
 do { \
@@ -84,6 +87,11 @@ void tegra_fb_linear_set(struct iommu_linear_map *map)
                LINEAR_MAP_ADD(tegra_carveout);
        }
 #endif
+       i = 0;
+
+       map = artec_linear_map;
+
+       LINEAR_MAP_ADD(artec_buff);
 }
 EXPORT_SYMBOL(tegra_fb_linear_set);
 
@@ -180,6 +188,8 @@ static struct swgid_fixup tegra_swgid_fixup_t210[] = {
        { .name = "tegra-udc",  .swgids = TEGRA_SWGROUP_BIT(PPCS) |
                                          TEGRA_SWGROUP_BIT(PPCS1) |
          TEGRA_SWGROUP_BIT(PPCS2), },
+       { .name = "1003000.pcie-controller", .swgids = TEGRA_SWGROUP_BIT(AFI),
+                               .linear_map = artec_linear_map, },
 #ifdef CONFIG_PLATFORM_ENABLE_IOMMU
        { .name = dummy_name,   .swgids = TEGRA_SWGROUP_BIT(PPCS) },
 #endif

Memory is still declared as:

...
        reserved-memory {
                #address-cells = <0x2>;
                #size-cells = <0x2>;
                ranges;

                vdmabuff: vdmabuffer@C0000000 {
                        reg = <0x0 0xc0000000 0x0 0x10000000>;
                };
        };

Device is declared as (without memory-region record):

...
        pcie-controller {
...
                pci@1,0 {
                        ranges= <0xc2000000 0x0 0x20000000 0xc2000000 0x0 0x20000000 0x0 0x20000000>;
                        pci@3,0 {
                                reg = <0x10000 0x0 0x0 0x0 0x0>;
                                compatible = "artcg,ag-parent-alpha";
                                ranges = <0x0 0xc2000000 0x0 0x20000000 0x100000>;
                                #address-cells = <0x1>;
                                #size-cells = <0x1>;
...

Hi All,

Any comments, ideas?

Is there anybody here?

I’m still waiting for comments…

Have you tried adding iommus = <&smmu TEGRA_SWGROUP_AFI>; as part of your device node i.e. pci@3,0. I understand that iommus is there already for parent node i.e. pcie-controller, but just wondering if it needs to be added to the actual node where ‘memory-region’ is present.

Hi Vidyas,

Yes, I have. I’ve started from this, but it changed in behavior nothing and finally I had to do an ugly hack described above.

Sorry for the delay in response.
I’m not sure to what extent the below solution works, but it is worth giving a try.
Since, we already know the reserved memory (whose address is a physical address),
get the virtual equivalent of it using ioremap() API and then use dma_map_single() API to get bus_address equivalent.
dma_addr_t bus_addr = dma_map_single(dev, ioremap(pa, SZ_256M) , DMA_BIDIRECTIONAL);
and supply this ‘bus_addr’ to DMA for any read/write operations.

Ok, I will try.

Hi,

As your FPGA modules behind the pcie-axi bridge, maybe you should see this:https://devtalk.nvidia.com/default/topic/967515/l4t-r24-2-pcie-iommu-not-working-when-switches-bridges-are-present/

Hi iemdey,

Thanks. Looks like the same issue.
So, my solution is also works for me for a while. But from my point of view it is the best way to be on the same page with a mainstream developers.

Alex,
did suggestions in #11 and #13 help to resolve the issue you are observing?

Hi Vidyas,

Unfortunately directly it doesn’t help.

#13 is completely another story.

But looking at these comments we have got some new ideas and should try them.
Now we are working with a our own solution described in the top of this thread.
It is not absolutely right approach, but same with some drivers form Jetpack, and it works indeed.
So, for the development it is Ok for the release we would be happy to find better solution.

I’m interested to know what kind of issues you faced when you had tried out #11 ?

Hi Vidyas,

Sorry for keeping you without updates.

Currently we are working hard on finalizing our FPGA design. The solution we have is OK for this stage.
We have a plan to return back for this an similar issues before release.

As far I remember when I tried to add

iommus = <&smmu TEGRA_SWGROUP_AFI>

to our pcie parent device driver definition in DT nothing had changed in behavior.
I gave up at that moment and continued to use our ugly hack.