capture camera via pcie

Helo, Nvidia Support,
We have been trying to implement PCIe driver based on TX2i (R28.2) recently.
With our application, we need to use our FPGA to grabe 4k x 4k video, and send this video data to TX2i via PCIe.
In order to get the best performance, we plan to send the video data from FPGA to NvBuffer so that we can utilize all resources
to process the video data such as JPEG encoding, video formatting/converting etc.
Our PCIe driver can write the video data into DDR of TX2i (for example 0xe0000000,disable pcie MMU and reserve DDR address start 0xe0000000 to 0xf0000000 size 256M ), but we have to make a memory copy from this DDR address
into NvBuffer. This memory copy will take about 2ms which we think too long and generate a longer latency.
Could you provide some guidence or PCIe driver sample code to show us how to write the video data directly into NvBuffer without memory copy?
Many thanks!

If I understand your driver design correctly, you are allocating memory in the driver and passing that pointer to your FPGA’s DMA to dump video data there (followed by data being copied from that buffer to NVBuffer)
Can you not get the physical address from NvBuffer and use DMA mapping APIs (map_single / map_page etc…) to get the IOVA equivalent of that, which then can be given to FPGA’s DMA to dump the data. This way, buffer can be avoided

How can I get nvbuf hardware address?
Sould I use user virtual address get from “NvBufferMemMap”, and get_user_pages to transfer user virtual address to memory page ? Then, use dma_map_page map memory page to bus address which can be given to FPGA DMA.

If this is correct, should I need to enable pcie iommu? Does this way could bring memory coherence problem?