Altera FPGA DMA to TX2 via PCIe problem

Dear Sir/Madam,

I developed a PCIe card with Altera FPGA and I am trying to DMA 1024 bytes data from FPGA on-chip memory to a fixed physical address in TX2 linux.

In the FPGA, I set a fixed translation table which map the 0xE0000000 for DMA writing. So I don’t need to get the physical address by dma_alloc_coherent() and transfer it to FPGA in TX2 linux.

In the TX2 linux, I set the mem=3G in cbootargs to keep the 0xE0000000 clear from TX2 linux. Then in my driver, I simply mmap the physical address(0xE0000000) to user space by remap_pfn_range(vma, vma->vm_start, phy_add>> PAGE_SHIFT, size, PAGE_SHARED).

First time when I run the test application, I got “arm-smmu 12000000.iommu: Unhandled context fault: iova=0xe0000000, fsynr=0x240013, …” error for every time the DMA running. Then I search the topic in our forum. I noticed I need to disable SMMU for PCIe. I did it by remove:

  1. #stream-id-cells = <1>;” from “tegra_pcie” node
  2. “<&{/pcie-controller@10003000} TEGRA_SID_AFI>,” from “smmu” node

After the fix, I run the test application again. There is no error anymore for the iommu. But I still cannot read anything from the 0xE0000000.

On the other hand, the system becomes very slow after I flash the new dtb without the pcie smmu. I check the dmesg and found many smmu error during the startup:

[ 0.212934] /iommu@12000000: could not get #stream-id-cells for /pcie-controller@10003000
[ 0.213042] arm-smmu 12000000.iommu: registered 40 master devices
[ 0.218490] mc: mapped MMIO address: 0xffffff8000540000 -> 0x2c10000
[ 0.218554] mc: mapped MMIO address: 0xffffff8000560000 -> 0x2c20000
[ 0.218602] mc: mapped MMIO address: 0xffffff8000640000 -> 0x2c30000
[ 0.218650] mc: mapped MMIO address: 0xffffff8000660000 -> 0x2c40000
[ 0.218713] mc: mapped MMIO address: 0xffffff8000fa0000 -> 0x2c50000
[ 0.218753] mc-err: Set intmask: 0xf3140
[ 0.219064] ecc-err: dram ecc disabled-MC_ECC_CONTROL:0x0000000c
[ 0.219555] bpmp: ping status is 0
[ 0.219877] arm-smmu 12000000.iommu: Unexpected {global,context} fault, this could be serious
[ 0.219903] arm-smmu 12000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00001432, GFSYNR2 0x00000000
[ 0.219964] bpmp d000000.bpmp: firmware tag is
[ 0.220005] (255) csw_bpmpw: MC request violates VPR requirements
[ 0.220018] status = 0x00337094; addr = 0x3ffffffc0
[ 0.220025] secure: yes, access-type: write
[ 0.220039] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000000, hubc_int_status=0x00000000
[ 0.220059] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000000, hubc_int_status=0x00000000
[ 0.220071] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000000, hubc_int_status=0x00000000
[ 0.220090] mc-err: Too many MC errors; throttling prints

Have you met similar problems? My questions are:

1. Is there any problem for me to DMA data with a fixed physical address?
2. Why I cannot read anything after fixing the SMMU? Is there anyway to confirm the data exsiting in a specific physical address(0xE0000000)?
3. Why the system become slow? Is there anyway to fix it?

Best regards!

/ Daning

With SMMU enabled for PCIe, any access to system memory from end point must go through SMMU and if there is no mapping available with SMMU for the address coming from EP, it will fault.

>> In the FPGA, I set a fixed translation table which map the 0xE0000000 for DMA writing. So I don’t need to get the physical address by dma_alloc_coherent() and transfer it to FPGA in TX2 linux.
I’m not sure what kind of mapping are you setting in FPGA. But, it is always recommended to use dma_alloc_coherent() (this is suggested by Linux documentation as well)

>> In the TX2 linux, I set the mem=3G in cbootargs to keep the 0xE0000000 clear from TX2 linux. Then in my driver, I simply mmap the physical address(0xE0000000) to user space by remap_pfn_range(vma, vma->vm_start, phy_add>> PAGE_SHIFT, size, PAGE_SHARED).
If you are taking off 0xE0000000 from Linux’s view, I’m wondering how can it be accessed in Linux after calling remap_pfn_range() API. You might be getting some random address as translation and hence probably not seeing anything there. IMHO, this is a bad idea.

I think you should just use dma_alloc_coherent() API and pass the DMA address to end point via its BAR (like any other typical PCIe end point). Deviating from this will land you in all kinds of issues.

Dear vidyas,

Thanks so much for your reply. You are right. Taking off 0xE0000000 from Linux’s view is the reason of my problem. I have proven it and fixed it.

Now the only problem is after I disable the SMMU for PCIe, the system becomes very slow. Slow startup, slow desktop. And many smmu error during the startup as mentioned before.

Did you meet the same problem after patching the SMMU of DTB? Or I patched it in a wrong way?

Best regards!

/ Daning

Anyone met the same problem when patching to disable the SMMU for PCIe?

The system becomes very slow. Slow startup, slow desktop. And many smmu error during the startup as mentioned before.

Any solution for it or I patched it in a wrong way?

Can you please provide the exact patch you applied to disable SMMU for PCIe?
Also, do you see system getting slowed down even with no PCIe device connected?
Can you please give release info also? (i.e 28.1 or 28.2 Etc…)

Dear vidyas,

First time when I run the test application, I got “arm-smmu 12000000.iommu: Unhandled context fault: iova=0xe0000000, fsynr=0x240013, …” error for every time the DMA running. Then I search the topic in our forum and found your reply in the following topic:
https://devtalk.nvidia.com/default/topic/1026334/pcie-dma-problem-between-tx2-amp-fpga/

I followed your instructions to remove following in dtsi file.

  1. #stream-id-cells = <1>;” from “tegra_pcie” node
  2. “<&{/pcie-controller@10003000} TEGRA_SID_AFI>,” from “smmu” node

Then I built the dtb and flash it to the board. After restart the system, everything becomes very slow.

I also try to start the system without any PCIe device and it still be very slow. Some errors when starting:

[ 0.213093] /iommu@12000000: could not get #stream-id-cells for /pcie-controller@10003000

My system version is :

R28 (release), REVISION: 2.0, GCID: 10567845, BOARD: t186ref, EABI: aarch64, DATE: Fri Mar 2 04:57:01 UTC 2018

Dear vidyas,

Any solution for it or I patched it in a wrong way?

/ Daning

Hi, Daning. I encounted the same problem as you. Do you have any progress?
thanks.

I tested in a new TX2 hardware. After removing “#stream-id-cells = <1>;”, the system becomes slow again and there are the errors when startup:

[ 0.219877] arm-smmu 12000000.iommu: Unexpected {global,context} fault, this could be serious

But when I add “#stream-id-cells = <1>;” and remove “<&{/pcie-controller@10003000} TEGRA_SID_AFI>”. The system become fast again and no error while startup. I checked ‘cat /sys/kernel/debug/12000000.iommu/masters/’, no pcie entries. Everything looks good.

Hi, Daning. Thanks very much. It worked.

hi!
these topic can’t be done with L4T 28.2 kernel version.
allocation is done in the hardware starting range 0x80000000 but the virtual address given back to user level with mmap function doesn’t point to the good memory space.
How can I disable SMMU for my PCIe driver?

Isn’t solution given in #9 not working for you?

Hi!

#stream-id-cells = <1>; is already in the file that I extracted from /boot/tegra186-quill-p3310-1000-c03-00-base.dtb
and there’s no line containing <&{/pcie-controller@10003000} TEGRA_SID_AFI>

and the command: ls /sys/kernel/debug/12000000.iommu/masters
show the line: 10003000.pcie-controller
So I reasonably think that the SMMU is enable for the pcie controler and that’s why I have wrong virtual address with the mmap function in the user level.

I use L4T 28.2 witch is the last kernel version.
All the solution I saw to disable the SMMU in the topics was with previous kernel version (28.1).
So that’s why I’m looking for another solution because nothing match with the L4T 28.2 case.

I forget to mention that driver and application code was already tested on a x86 platform and made DMA transfer from PCIe device to the system memory without any problem.
And Buffer was correctly process at the user level with the address given by mmap function.

Need feedback…

The following patch should work and it is for 28.2 release

diff --git a/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi b/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi
index 5534343..8e2bd15 100644
--- a/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi
+++ b/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi
@@ -186,7 +186,6 @@
                              <&tegra_adsp_audio        TEGRA_SID_APE>,
                              <&{/sound}                TEGRA_SID_APE>,
                              <&{/sound_ref}            TEGRA_SID_APE>,
-                             <&{/pcie-controller@10003000} TEGRA_SID_AFI>,
                              <&{/ahci-sata@3507000}    TEGRA_SID_SATA2>,
                              <&{/aon@c160000}          TEGRA_SID_AON>,
                              <&{/rtcpu@b000000}        TEGRA_SID_RCE>,
@@ -1508,8 +1507,6 @@
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 72 0x04>;// check this
 
-               #stream-id-cells = <1>;
-
                bus-range = <0x00 0xff>;
                #address-cells = <3>;
                #size-cells = <2>;

I’m using Jetpack 3.3 (28.2.1 release). In my dtb, only the #stream-id-cells = <1>; line is present. There is no <&{/pcie-controller@10003000} TEGRA_SID_AFI>. The dtb file is tegra186-quill-p3310-1000-a00-00-base.dtb. Is it correct?

Please apply the changes in .dtsi file.