PCIe DMA on Tegra (Xavier NX)

Hello,

Recently I started a research on data transferring between Xavier and FPGA through DMA (engine) and PCIe. According to your TEGRA_DW_DMA_TEST code it’s possible to check the send data and its corresponding so I generated some data on “pcie->cpu_virt_addr” and did dma_write(), finally checked “dst_cpu_virt” but the data on those memories are different, could you explain why and how can I fix this issue? (also I have checked your original reference source code and other articles on this forum but failed to get intended result)

static void print_tegra_dwdma_log(struct tegra_pcie_dw *pcie, const char *prefix_str, u8 tmpbuf, u32 _size)
{
#if 1
/
DUMP_PREFIX_NONE, DUMP_PREFIX_ADDRESS or DUMP_PREFIX_OFFSET */
print_hex_dump(KERN_INFO, prefix_str, DUMP_PREFIX_ADDRESS, 8, 4, tmpbuf, _size, false);

static int write(struct seq_file *s, void *data)
{
struct tegra_pcie_dw *pcie = (struct tegra_pcie_dw *)(s->private);
struct dma_tx tx;
int ret = 0, size = 0, orgsize=0;
void __iomem *dst_cpu_virt;
u8 *tmpbuf1=NULL, *tmpbuf2=NULL, tmpbuf3=0;

memset(&tx, 0x0, sizeof(struct dma_tx));
tx.src = pcie->src;
tx.dst = pcie->dst;
tx.size = pcie->size;
tx.channel = pcie->channel;

dst_cpu_virt = ioremap_nocache(pcie->dst, (pcie->size)*SZ_1K);

tmpbuf1 = kzalloc(pcie->size, GFP_KERNEL);
if(!tmpbuf1) {
dev_err_once(pcie->dev, “tmpbuf1 alloc failed\n”);
goto err_out;
}
tmpbuf2 = kzalloc(pcie->size, GFP_KERNEL);
if(!tmpbuf2) {
dev_err_once(pcie->dev, “tmpbuf2 alloc failed\n”);
goto err_out;
}

/* fill source with random data /
#if 0
get_random_bytes(pcie->cpu_virt_addr, pcie->size);
else
size = pcie->size;
orgsize = size;
while(size>=1) {
tmpbuf3 = orgsize-size;
tmpbuf1[orgsize-size]=((tmpbuf3 & 0x0f)<<4)|(tmpbuf3); /
0x00, 0x11, 0x22, 0x33, … */
size–;
}
memcpy(pcie->cpu_virt_addr, tmpbuf1, pcie->size);
endif

size = pcie->size;
print_tegra_dwdma_log(pcie, "pcie->cpu_virt_addr[B]: ", (u8 *)pcie->cpu_virt_addr, size);
print_tegra_dwdma_log(pcie, " dst_cpu_virt[B]: ", (u8 *)dst_cpu_virt, size);

ret = dma_write(pcie, &tx);
if (ret < 0) {
dev_err(pcie->dev, “DMA-Write test FAILED (dma_write)\n”);
ret = -EIO;
goto err_out;
}

/* compare copied data */
if (!memcmp(pcie->cpu_virt_addr, dst_cpu_virt, pcie->size))
dev_info(pcie->dev, “DMA-Write test PASSED (cmp_succ)\n”);
else {
dev_info(pcie->dev, “DMA-Write test FAILED (cmp_fail)\n”);

  print_tegra_dwdma_log(pcie, "pcie->cpu_virt_addr[A]: ", (u8 *)pcie->cpu_virt_addr, size);
  print_tegra_dwdma_log(pcie, "       dst_cpu_virt[A]: ", (u8 *)dst_cpu_virt, size);

}

err_out:
if (tmpbuf1) kfree(tmpbuf1);
if (tmpbuf2) kfree(tmpbuf2);

iounmap(dst_cpu_virt);
return ret;
}

I got below result;

Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.707588] pcie->cpu_virt_addr[B]: ffffff8023d22000: 33221100 77665544 bbaa9988 ffeeddcc
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.716859] dst_cpu_virt[B]: ffffff800b3cc000: 33221100 fedcba98 bbaa9988 88776655
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.748646] tegra-pcie-dw 141a0000.pcie: pci->atu_base: 0xffffff8012180000
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.762236] tegra-pcie-dw 141a0000.pcie: [B]val: 0xe000e
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.769538] tegra-pcie-dw 141a0000.pcie: [A]val: 0xe000e
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.774016] tegra-pcie-dw 141a0000.pcie: [B]val(DMA_CH_CONTROL1_OFF_WRCH): 0x68
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.776554] tegra-pcie-dw 141a0000.pcie: [A]val(ll): 0x8
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.778161] tegra-pcie-dw 141a0000.pcie: !(tx->ll)
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.779115] tegra-pcie-dw 141a0000.pcie: pcie->dma_poll
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.780052] tegra-pcie-dw 141a0000.pcie: DMA write. Size: 16 bytes, Time diff: 934591 ns
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.781139] tegra-pcie-dw 141a0000.pcie: DMA-Write test FAILED (cmp_fail)
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.783539] pcie->cpu_virt_addr[A]: ffffff8023d22000: 33221100 77665544 bbaa9988 ffeeddcc
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.784642] dst_cpu_virt[A]: ffffff800b3cc000: 33221100 fedcba98 bbaa9988 88776655

You can check the kernel log and the result of lspci -vv as enclosed.
kern.log (3.2 MB)
lspci_result.txt (6.4 KB)

Thanks in advance,
Matt

Which 4T version?

Have you refer to Enabling CONFIG_PCIE_TEGRA_DW_DMA_TEST jetson xavior - Jetson & Embedded Systems / Jetson AGX Xavier - NVIDIA Developer Forums?

Hello,

I have read that topic already and my L4T is JetPack_4.6.1_Linux_JETSON_XAVIER_NX_TARGETS.

Did you check my source code and the log? Please check my log please.

Which code are you referring to?

pcie-tegra.c in kernel\nvidia\drivers\pci\dwc folder.

BTW, do you have another reference source code regarding “TEGRA_DW_DMA_TEST” ?

Hello Supporters,
Is there anyone could advice on this topic? I have wasted too much time but in vain, please help me.
Thanks,
Matt

Hi,

Can you check whether the PCIe transaction exists from your FPGA side by using chipscope, etc. ? And maybe dst_cpu_virt[B]: ffffff800b3cc000 is the virtual address of Xilinx Memory controller, so can you access the memory by cpu ?

Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.707588] pcie->cpu_virt_addr[B]: ffffff8023d22000: 33221100 77665544 bbaa9988 ffeeddcc
Jun 13 13:57:34 cuda1-desktop kernel: [ 5234.716859] dst_cpu_virt[B]: ffffff800b3cc000: 33221100 fedcba98 bbaa9988 88776655

But, why the first 4 bytes are same ‘33221100’ ?

Hi k-hamada,

Yes, that’s the point that I have investigated so long time in vain.

yes of course dst_cpu_virt is the VA of IOMEM and as you know it’s allocated by TEGRA_GW_DMA_TEST reference code as below;

dst_cpu_virt = ioremap_nocache(pcie->dst, (pcie->size)*SZ_1K);

and I set “pcie->dst” as “0x1f40000000” according to the DTS as below (at tegra194-soc-pcie.dtsi);

  bus-range = <0x0 0xff>;

  ranges = <0x81000000 0x0 0x3a100000 0x0 0x3a100000 0x0 0x00100000      /* downstream I/O (1MB) */
  	  0x82000000 0x0 0x40000000 0x1f 0x40000000 0x0 0xC0000000     /* non-prefetchable memory (3GB) */
  	  0xc3000000 0x1c 0x00000000 0x1c 0x00000000 0x3 0x40000000>;  /* prefetchable memory (13GB) */

I’m not sure so I tried to check the content of that memory by printing its content but as you can check with my result, it’s weird.

As far as I know, the content of “pcie->cpu_virt_addr” should be loaded in “dst_cpu_virt” memory after dma_write(by dma engine, IP) but the 1st 4 byte of memory is loaded well and the remaining 2nd/3rd/4th 4 byte of memory are weird and wrong shifted I think, so I need your analysis support.

Thanks,
Matt

Hi,

It is necessary to clarify the situation in order to give accurate advice.

  1. Is it possible to confirm the PCIe transaction or access to the memory on the board from the FPGA side
    using a means such as Chipscope? Yes or No?
  2. Is it possible to read or write the contents of the memory on your FPGA board without using your Xavier?
    Yes or No?
  3. Is it possible to read or write the memory on your FPGA board via PCIe by the Xavier CPU Xavier
    without using DMA? Yes or No?
  4. Can 16 bytes data ‘33221100 77665544 bbaa9988 ffeeddcc’ be written normally using your Xavier CPU?
    Yes or No?
  5. What will happen to the data when the DMA test is executed under the condition that first 16 bytes of
    the memory on the FPGA are all initialized to 0 ? ‘33221100 00000000 00000000 00000000’?

Yes, possible you can refer to below capture, it’s captured by Xilinx’s debug tool;

Sorry but I don’t understand, to where?

No, as I denoted earlier something like this “33221100 fedcba98 bbaa9988 88776655” is written (I have checked this data through the FPGA’s JTAG tool as I uploaded a captured picture above). One thing, when I tried to write only 4 bytes data it’s OK but 8, 12, 16 bytes of data (greater than 4 bytes data) it’s weird.

I just loaded only 16 bytes of data on the CPU memory and initiated DMA transfer, finally tried to read the DMA memory in direct.

Thanks,
Matt

Hi,

Yes, possible you can refer to below capture, it’s captured by Xilinx’s debug tool;

Does there exist ‘77665544’ or ‘bbaa9988’ in the captured data?

Sorry but I don’t understand, to where?

Of course, I mean from the Xavier’s memory to the FPGA’s memory. Please rewrite the contents of your write () function to do that. The data are written normally in that case? This test is to see if there is a difference between memory transfer by CPU and memory transfer by DMA.

No, as I denoted earlier something like this “33221100 fedcba98 bbaa9988 88776655” is written

Does this mean that ‘00000000 00000000 00000000 00000000’ will change to ‘33221100 fedcba98 bbaa9988 88776655’? This is a test to see if only the first 4 bytes are transferred.

Sorry, FPGA engineer told me that he uploaded preset data on FPGA’s memory (for DMA) so the remaining data (12 bytes) are correct. Only the 1st 4bytes of memory is over-written.

So the issue is then why only the 1st 4 bytes of data were written correctly, where are those 12 bytes of data? Do you have any idea?

Thanks,
Matt

Hi,

FPGA engineer told me that he uploaded preset data on FPGA’s memory (for DMA) so the remaining data (12 bytes) are correct.

Is the 12-byte data ‘fedcba98bbaa998888776655’ preset data?

So the issue is then why only the 1st 4 bytes of data were written correctly, where are those 12 bytes of data? Do you have any idea?

Even though Xavier’s DMA sends 16 bytes of data, there is a possibility that the PCIe core on the FPGA side is only responding to the first 4 bytes.
So to test it, please write these 16-byte data with your Xavier ARM CPU instead of the DMA.

Do do this, only call a function

memcpy (dst_cpu_virt, pcie-> cpu_virt_addr, pcie-> size);

instead of

dma_write (pcie, & tx);

By the way, the PCIe core of the FPGA used is

?
I think it is not the signal s_axis_tx_tdata but m_axis_rx_tdata that the data flows when your PCIe core receives something effective.

Yes, correct.

I got this result;

Please capture the received signal ‘m_axis_rx_tdata’ and observe the TLP(Transaction Layer Packet), especially its header and payloads.
If you can find the data ‘fedcba98 bbaa9988 88776655’, then there exists some problem after the FPGA’s PCIe core, for example PIO_EP.
Otherwise, you have some problem before the PCIe core or core itself.
And to debug those, please refer to the fundamental knowledge, for example
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1
and Xilinx’s PCIe core datasheet as I mentioned before.
http://xilinx.eetrend.com/files-eetrend-xilinx/download/201806/13080-37733-pg054-7series-pcie.pdf
At least, you must have the knowledge to find the payloads amount from the TLP header’s data.

Thanks, I will check them.
Matt

Hello Supporters,
You may close this issue, thank you for your supports.
Matt

Maybe, I think you’ve solved the problem. Of course , you have the right to close this topic, but if the problem is caused by the driver of DMA, please give your feedback to the forum.

Several things reviewed internally and corrected whole picture in my head,

  1. DMA memory direct writing at RC (RP) misled my understanding of data transferring which made me confused. So I commented out direct memory write at “write” function.
  2. Preloaded data on EP’s DMA memory (FPGA) made me confused.
  3. Omission of data update after data transmission (by DMA) misled my understandings, FPGA engineer is working on this now.

After all corrections of these things I faced what all things are going well.
Thanks,
Matt