I have one Xavier AGX connected to a host x86 PC. They are connected with a PCIe connector and use the PCIe x16 external slot of Xavier AGX. The Xavier AGX is configured as an endpoint device (NVIDIA RAM Memory).
In the endpoint function driver (‘pci-epf-nv-test.c’ located in L4T kernel source code (kernel/nvidia/drivers/pci/endpoint/functions directory), necessary codes have been added in order to allocated 256Mo dma memory with dma_alloc_coherent.
Informations of memory allocated with dma_alloc_coherent are exported and used by another pcie driver. In this second driver, dma_mmap_coherent is used to map the memory allocated with dma_alloc_coherent in the ‘pci-epf-nv-test.c’ driver. Then user can access to this memory by calling mmap on the character device created by the second driver.
All work fine, a user application can read and write to/from this mmapped memory. However, copy 256MB from this mmapped memory to a local buffer (allocated with malloc for exemple) has real poor performance (around 78MB/s). Writing from a local buffer (allocated with malloc) to the mmapped memory has also poor performance (around 1.5GB/s). Performance of a copying between two locals buffers is around 6GB/s.
How can I improve the performance of reading/writing to/from the mmaped memory ?