I have one Xavier AGX connected to a host x86 PC. They are connected with a PCIe connector and use the PCIe x16 external slot of Xavier AGX. The Xavier AGX is configured as an endpoint device (NVIDIA RAM Memory).
In the endpoint function driver (‘pci-epf-nv-test.c’ located in L4T kernel source code (kernel/nvidia/drivers/pci/endpoint/functions directory), necessary codes have been added in order to allocated 256Mo dma memory with dma_alloc_coherent.
Informations of memory allocated with dma_alloc_coherent are exported and used by another pcie driver. In this second driver, dma_mmap_coherent is used to map the memory allocated with dma_alloc_coherent in the ‘pci-epf-nv-test.c’ driver. Then user can access to this memory by calling mmap on the character device created by the second driver.
All work fine, a user application can read and write to/from this mmapped memory. However, copy 256MB from this mmapped memory to a local buffer (allocated with malloc for exemple) has real poor performance (around 78MB/s). Writing from a local buffer (allocated with malloc) to the mmapped memory has also poor performance (around 1.5GB/s). Performance of a copying between two locals buffers is around 6GB/s.
How can I improve the performance of reading/writing to/from the mmaped memory ?
Can you please try using dma_alloc_coherent() with dma_alloc_writecombined() and perform an explicit dsb() before letting the data accessed by the userspace code?
Apologies for the delay, we have just returned from a long weekend.
I’m not sure to understand well : I must replace dma_alloc_coherent() by dma_alloc_writecombined() ?
And how can I perform an explicit dsb() ? (dsb() in defined in which file ?). I tried to find on Google, but I don’t found any clear response. Thanks
With dma_alloc_writecombined and subsequent call to dsb(sy) will improve the write performance only.
Another option -
In pci device dt node, add ‘dma-coherent’ property. As Xavier is io-coherent soc, no need to of cache operation if this property is set in dt node.
Did you enable IOMMU for PCI device?