Pcie send error under linklist mode

When I malloc 1MB memory in user space, and find the pages in kernel, so I setup dma descriptors by dma_map_sg function under linklist mode. When dma write over, I find some received data is right, some other is not right. When dma writting, some errors as follows:
[ 248.509343] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xcfcc8000, fsynr=0x270003, cb=0, sid=91(0x5b - PCIE5), pgd=820c1f003, pud=820c1f003, pmd=7ea563003, pte=0
[ 248.509368] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0xcfcc9000, fsynr=0x380003, cb=0, sid=91(0x5b - PCIE5), pgd=820c1f003, pud=820c1f003, pmd=7ea563003, pte=0
[ 248.509512] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xcfce1000, fsynr=0x270003, cb=0, sid=91(0x5b - PCIE5), pgd=820c1f003, pud=820c1f003, pmd=7ea563003, pte=0
[ 248.509521] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0xcfcf0e00, fsynr=0x380003, cb=0, sid=91(0x5b - PCIE5), pgd=820c1f003, pud=820c1f003, pmd=7ea563003, pte=0
[ 248.651904] irq 90: nobody cared (try booting with the “irqpoll” option)
[ 248.651946] CPU: 0 PID: 7749 Comm: test_main Not tainted 4.9.140-tegra #3
[ 248.651951] Hardware name: Jetson-AGX (DT)
[ 248.651954] Call trace:
[ 248.651972] [] dump_backtrace+0x0/0x198
[ 248.651980] [] show_stack+0x24/0x30
[ 248.651987] [] dump_stack+0x98/0xc0
[ 248.651996] [] __report_bad_irq+0x3c/0xf8
[ 248.652001] [] note_interrupt+0x2c8/0x318
[ 248.652012] [] handle_irq_event_percpu+0x50/0x60
[ 248.652016] [] handle_irq_event+0x50/0x80
[ 248.652022] [] handle_fasteoi_irq+0xc8/0x1b8
[ 248.652027] [] generic_handle_irq+0x34/0x50
[ 248.652032] [] __handle_domain_irq+0x68/0xc0
[ 248.652037] [] gic_handle_irq+0x5c/0xb0
[ 248.652041] [] el1_irq+0xe8/0x194
[ 248.652049] [] irq_exit+0xd0/0x118
[ 248.652053] [] __handle_domain_irq+0x6c/0xc0
[ 248.652058] [] gic_handle_irq+0x5c/0xb0
[ 248.652063] [] el0_irq_naked+0x54/0x60
[ 248.652066] handlers:
[ 248.652081] [] tegra_mcerr_hard_irq threaded [] tegra_mcerr_thread
[ 248.652084] Disabling IRQ #90

How to resolve this problem?

There seems to be some issue with the addresses used by the DMA engine. Please check addresses given by the error log with ‘iova=0xcfcxxxxx’ format to see if they are valid or not.

If i use dma_alloc_coherent function and setup dma descriptors under linklist mode, there is no problem.But If use malloc function in user space to alloc not continuous physical memory and setup dma descriptors under linklist mode, there are errors like above.

in hardware/nvidia/soc/t19x/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi:
pinctrl-names = “pex_rst”, “clkreq”;
pinctrl-0 = <&pex_rst_c5_out_state>;
pinctrl-1 = <&clkreq_c5_bi_dir_state>;

    iommus = <&smmu TEGRA_SID_PCIE5>;
    dma-coherent;

#if LINUX_VERSION >= 414
iommu-map = <0x0 &smmu TEGRA_SID_PCIE5 0x1000>;
iommu-map-mask = <0x0>;
#endif

to solve this problem, we remove the line “dma-coherent”. Is that right?

Removing ‘dma-coherent’ is not the correct thing to do.
BTW, since you are allocating the memory in the user space, are you also pinning it in the kernel before mapping it to get the IOVA address?

Yes.

My process is like this:
malloc() in user space;
__get_user_pages_fast in kernel;
dma_map_sg in kernel

for (i = 0, sg = sgt->sgl; i < sgt->nents; i++, sg = sg_next(sg)) {
unsigned int len = sg_dma_len(sg);
dma_addr_t addr = sg_dma_address(sg);

    while (len > 0) {
        transfer->desc[idx].sar_low  = lower_32_bits(addr);
        transfer->desc[idx].sar_high = upper_32_bits(addr);
        transfer->desc[idx].dar_low  = lower_32_bits(dst_iova);
        transfer->desc[idx].dar_high = upper_32_bits(dst_iova);

        if (len > DESC_MAXLEN) {
            addr += DESC_MAXLEN;
            len -= DESC_MAXLEN;
            transfer->desc[idx].size = DESC_MAXLEN;
            dst_iova += DESC_MAXLEN;
        } else {
            transfer->desc[idx].size = len;
            dst_iova += len;
            len = 0;
        }
        idx++;
    }

    //link list mode
    if (idx >= (transfer->desc_cnt - 1)) {
        transfer->desc[idx].ele_1.llp = 1;
        transfer->desc[idx].ele_1.tcb = 1;
    } else {
        transfer->desc[idx].ele_1.llp = 1;
        transfer->desc[idx].sar_low = transfer->src_iova_addr +
            (idx + 1) * sizeof(struct xdma_desc);
        transfer->desc[idx+1].ele_1.lie = 1;
        idx++;
    }
}

then , call dma_write

Are you using the same dev* for both dma_map_alloc and dma_map_sg? Ideally, the sequence you are following should work.

No use dma_map_alloc, using dma_map_sg after calling __get_user_pages_fast on pcie ep device.

In my test scene, one ep(xavier) device send data by dma under linklist mode to rc(xavier) device.

for (i = 0, sg = sgt->sgl; i < sgt->nents; i++, sg = sg_next(sg)) {
unsigned int len = sg_dma_len(sg);
dma_addr_t addr = sg_dma_address(sg);
printk(“fang----len=0x%x, addr=0x%llx\n”, len, addr);

    while (len > 0) {
        transfer->desc[idx].sar_low  = lower_32_bits(addr);
        transfer->desc[idx].sar_high = upper_32_bits(addr);
        transfer->desc[idx].dar_low  = lower_32_bits(dst_iova);
        transfer->desc[idx].dar_high = upper_32_bits(dst_iova);

        if (len > DESC_MAXLEN) {
            addr += DESC_MAXLEN;
            len -= DESC_MAXLEN;
            transfer->desc[idx].size = DESC_MAXLEN;
            dst_iova += DESC_MAXLEN;
        } else {
            transfer->desc[idx].size = len;
            dst_iova += len;
            len = 0;
        }
        idx++;
    }

    //link list mode
    if (idx >= (transfer->desc_cnt - 1)) {
        transfer->desc[idx].ele_1.llp = 1;
        transfer->desc[idx].ele_1.tcb = 1;
    } else {
        transfer->desc[idx].ele_1.llp = 1;
        transfer->desc[idx].sar_low = transfer->src_iova_addr +
            (idx + 1) * sizeof(struct xdma_desc);
        transfer->desc[idx+1].ele_1.lie = 1;
        idx++;
    }
}

when I test ep sending data to rc, dmesg on ep device as follows:
[17640.464778] fang----len=0x10000, addr=0xcff00000
[17640.464782] fang----len=0x10000, addr=0xcff10000
[17640.464785] fang----len=0x10000, addr=0xcff20000
[17640.464788] fang----len=0x10000, addr=0xcff30000
[17640.464791] fang----len=0x10000, addr=0xcff40000
[17640.464794] fang----len=0x10000, addr=0xcff50000
[17640.464797] fang----len=0x10000, addr=0xcff60000
[17640.464828] fang----len=0x10000, addr=0xcff70000
[17640.464831] fang----len=0x10000, addr=0xcff80000
[17640.464834] fang----len=0x10000, addr=0xcff90000
[17640.464837] fang----len=0x10000, addr=0xcffa0000
[17640.464840] fang----len=0x10000, addr=0xcffb0000
[17640.464843] fang----len=0x10000, addr=0xcffc0000
[17640.464846] fang----len=0x10000, addr=0xcffd0000
[17640.464849] fang----len=0x10000, addr=0xcffe0000
[17640.464854] fang----len=0x10000, addr=0xcfff0000
[17640.465081] fang----ep—dma write over, size=1048576, time=26113
[17640.465138] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xcff48000, fsynr=0x270003, cb=0, sid=91(0x5b - PCIE5), pgd=7c4a6b003, pud=7c4a6b003, pmd=759912
[17640.465478] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0xcff49000, fsynr=0x380003, cb=0, sid=91(0x5b - PCIE5), pgd=7c4a6b003, pud=7c4a6b003, pmd=759912
[17640.466111] mc-err: (255) csr_pcie5r1: EMEM address decode error
[17640.466217] mc-err: status = 0x200640ef; addr = 0xffffffff00; hi_adr_reg=ff08
[17640.466337] mc-err: secure: yes, access-type: read
[17640.466431] mc-err: mcerr: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
[17640.466634] send submit: xdma->sn=192.168.3.0, localmeta=1, recv transfer=0xffffffc3914f63c0, ubuf=0x7f700ff000, rd_idx=1, bar0_addr=0x0, dst_iova=0xffd00000
[17640.466638] fang----len=0x10000, addr=0xcff00000
[17640.466642] fang----len=0x10000, addr=0xcff10000
[17640.466649] fang----len=0x10000, addr=0xcff20000
[17640.466652] fang----len=0x10000, addr=0xcff30000
[17640.466655] fang----len=0x10000, addr=0xcff40000
[17640.466658] fang----len=0x10000, addr=0xcff50000
[17640.466661] fang----len=0x10000, addr=0xcff60000
[17640.466664] fang----len=0x10000, addr=0xcff70000
[17640.466667] fang----len=0x10000, addr=0xcff80000
[17640.466670] fang----len=0x10000, addr=0xcff90000
[17640.466673] fang----len=0x10000, addr=0xcffa0000
[17640.466676] fang----len=0x10000, addr=0xcffb0000
[17640.466679] fang----len=0x10000, addr=0xcffc0000
[17640.466682] fang----len=0x10000, addr=0xcffd0000
[17640.466685] fang----len=0x10000, addr=0xcffe0000
[17640.466688] fang----len=0x10000, addr=0xcfff0000
[17640.466828] fang----ep—dma write over, size=1048576, time=1888
[17640.466886] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xcff20000, fsynr=0x270003, cb=0, sid=91(0x5b - PCIE5), pgd=7c4a6b003, pud=7c4a6b003, pmd=759912
[17640.467198] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0xcff21000, fsynr=0x380003, cb=0, sid=91(0x5b - PCIE5), pgd=7c4a6b003, pud=7c4a6b003, pmd=759912
[17640.468597] mc-err: (255) csr_pcie5r1: EMEM address decode error
[17640.473602] mc-err: status = 0x200640ef; addr = 0xffffffff00; hi_adr_reg=ff08
[17640.480497] mc-err: secure: yes, access-type: read
[17640.485711] mc-err: mcerr: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
[17640.486141] send submit: xdma->sn=192.168.3.0, localmeta=1, recv transfer=0xffffffc3914f6480, ubuf=0x7f701ff000, rd_idx=2, bar0_addr=0x0, dst_iova=0xffc00000
[17640.486144] fang----len=0x10000, addr=0xcff00000
[17640.486147] fang----len=0x10000, addr=0xcff10000
[17640.486149] fang----len=0x10000, addr=0xcff20000
[17640.486151] fang----len=0x10000, addr=0xcff30000
[17640.486154] fang----len=0x10000, addr=0xcff40000
[17640.486156] fang----len=0x10000, addr=0xcff50000
[17640.486158] fang----len=0x10000, addr=0xcff60000
[17640.486160] fang----len=0x10000, addr=0xcff70000
[17640.486162] fang----len=0x10000, addr=0xcff80000
[17640.486164] fang----len=0x10000, addr=0xcff90000
[17640.486167] fang----len=0x10000, addr=0xcffa0000
[17640.486169] fang----len=0x10000, addr=0xcffb0000
[17640.486171] fang----len=0x10000, addr=0xcffc0000
[17640.486173] fang----len=0x10000, addr=0xcffd0000
[17640.486176] fang----len=0x10000, addr=0xcffe0000
[17640.486178] fang----len=0x10000, addr=0xcfff0000
[17640.486355] fang----ep—dma write over, size=1048576, time=2016
[17640.486443] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xcff20000, fsynr=0x270003, cb=0, sid=91(0x5b - PCIE5), pgd=7c4a6b003, pud=7c4a6b003, pmd=759912
[17640.486470] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0xcff21000, fsynr=0x380003, cb=0, sid=91(0x5b - PCIE5), pgd=7c4a6b003, pud=7c4a6b003, pmd=759912
[17640.486591] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xcff57a00, fsynr=0x270003, cb=0, sid=91(0x5b - PCIE5), pgd=7c4a6b003, pud=7c4a6b003, pmd=759912
[17640.486601] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0xcff67100, fsynr=0x380003, cb=0, sid=91(0x5b - PCIE5), pgd=7c4a6b003, pud=7c4a6b003, pmd=759912
[17640.561118] mc-err: Too many MC errors; throttling prints
[17640.566825] send submit: xdma->sn=192.168.3.0, localmeta=1, recv transfer=0xffffffc3914f6540, ubuf=0x7f702ff000, rd_idx=3, bar0_addr=0x0, dst_iova=0xffb00000
[17640.566830] fang----len=0x10000, addr=0xcff00000
[17640.566834] fang----len=0x10000, addr=0xcff10000
[17640.566838] fang----len=0x10000, addr=0xcff20000
[17640.566841] fang----len=0x10000, addr=0xcff30000
[17640.566844] fang----len=0x10000, addr=0xcff40000
[17640.566848] fang----len=0x10000, addr=0xcff50000
[17640.566851] fang----len=0x10000, addr=0xcff60000

Hi, vidyas

Are there some further suggestions?