Cache Coherency Issue when writing to shared memory from User space?

Hi.

I am working on a PCIe driver for a custom end point device. I am using AGX Xavier as my root port. The end point device is an Xilinx FPGA based implementation.

I am trying to create a shared memory between kernel space and user space. My objective is to write to this shared memory from user space, and then use NVIDIA’s PCI DMA engine to send data to the end point via the PCIe bus.

However, I am facing a problem that my shared memory seems to be cached when writing to it from user space. The method I am using to get the shared memory is this:

  1. Get a 4MB memory block in kernel space using kzalloc().

  2. Obtain the DMA address for this 4MB block via dma_map_single() or dma_alloc_coherent().

  3. Use remap_pfn_range() to mmap this memory to user space. Also use pgprot_noncached() to mark the memory region as non-cached before using remap_pfn_range(). ( I get the page frame number (pfn) in remap_pfn_range() by using virt_to_phys() on the virtual memory pointer obtained via kzalloc() ).

  4. In user space, use mmap() to obtain the user space virtual pointer for the shared memory. Write data to the shared memory via memcpy().

When I read the data written to this shared memory from kernel space, I see some inconsistencies. The data does not seem to be completely written to the shared memory. The amount of data written is random for each test. Sometimes all data is written correctly, and sometimes my 4MB of shared memory is only partially filled.

I have browsed multiple forums, and it seems that the method I am using to create a non-cached shared memory is correct (as per my understanding). I would be willing to share the relevant portion of the code privately if this would help.

I have tried the above with both SMMU disabled and SMMU enabled, but the issue seems to be persistent in both cases. Is it cache related issue? Or is it something else I am missing? What can I do to fix this issue?

PS: I have also tried using dma_mmap_attrs() instead of remap_pfn_range() to eliminate the need of using a physical address, but the same issue occurs there as well.
I must also point out, that I have created another shared buffer of 4MB size for receiving data in AGX Xavier. For that, the end point device writes data directly to the shared memory via DMA. When I read this RX shared memory from user space, I can always read up-to-date data. There are no issues on RX buffer, but the TX buffer is causing problems. The only difference between the TX and RX buffers is that I write data to the TX buffer using memcpy, and the RX buffer gets data from PCIe end point device via DMA.

How can I fix the above issue?

Any help would be greatly appreciated.

Regards,
Sana Ur Rehman

Hi,

This seems own driver issue which may not be NVIDIA issue. I will let other users who has experience to reply first.

I must also point out that if I use assembler directives (using __asm__flag) from user space to flush the cache explicitly, and then read the TX buffer in kernel space, my buffer contains up to date data IF AND ONLY IF SMMU is disabled. If SMMU is enabled, even explicitly flushing cache from user space doesn’t work.

Hopefully this would give you some clue as to what is wrong here? I would like to get this working with SMMU enabled, and without having to explicitly flush cache if possible.

Anyone have any suggestions?

@WayneWWW , I need support on this issue. Kindly help please. What am I doing wrong here?