I am attempting to write a linux kernel module, which will be used to send data from NVIDIA (root port) to a custom FPGA based end point.
So far, I am writing data to BAR0 using memcpy() in user space to send data to the end point. However, this gives me very poor throughput. I created a forum post (PCIe Link Speed Issue), and was told to use DMA.
However, I currently have no idea how I can do that? Do I use DMA to write the data to BAR0 (with BAR0 acting as the DMA destination address)? Or do I do something else? I have searched a lot of documentations, so far I have been unable to find any APIs that will help me do this. (I am new to kernel module writing, so maybe I am missing something obvious here. If anyone could point me to the right direction, I’d be grateful).
I have looked at dmatest.c, but couldn’t identify any APIs that will help me achieve what I want to do. I found this thread (How to Using GPC-DMA MEM2MEM Function), which gives a summarised version of performing MEM2MEM DMA. I successfully managed to do MEM2MEM DMA using this example.
I then attempted to change the destination address in MEM2MEM DMA example to the PCI BAR0 address. However, this results in kernel panic (probably because I cant use BAR0 address directly for DMA? Not sure about this). Maybe I can use BAR0 address as DMA destination address in MEM2MEM if I disable SMMU?
Or is there some other API that will directly allow me to DMA into the PCI BAR region? (I found something regarding MEM2MMIO mode of GPC DMA in Xavier Technical Reference Manual, but have no idea how I can use it in a kernel module).
Note that I am talking about using DMA when sending data from NVIDIA root port to end point, and not about sending data from end point to root port.
Any help would be greatly appreciated.
Note: I am using Jetpack 5.1.
Sana Ur Rehman