GPU access to memory allocated by dma_alloc_coherent

jeremy.pallotta · October 26, 2020, 12:17pm

I have a device driver on the TX2 that can allocate DMA-able coherent memory using dma_alloc_coherent. The application typically allocates 2 buffers of ~256MBytes each. 2 FPGA DMA engines simultaneously push data into the DMA buffers in 128MB chunks. The DMA buffer is memory mapped by the host application so there is user-space access to the coherent memory. Once the data is delivered, it would be ideal if the GPU could begin processing it. At the present moment, the application copies the data to a CUDA managed buffer and then CUDA kernels are executed.

My questions are:

Is it possible to avoid the copy due to the unified memory on the TX2 tegra? If so, how?
Is GPU pinned memory coherent such that a DMA engine could deliver data to a pinned-memory buffer? If so, maybe the application could avoid the dma_alloc_coherent step and instead allocate pinned memory through the CUDA API calls. Then, point the DMA engine at the cuda-allocated pinned memory.
Is it possible to label the linux kernel driver memory buffer as “GPU-accessible” such that the memory allocated by dma_alloc_coherent could be accessible by the cuda kernel?

AastaLLL · October 27, 2020, 2:42am

Hi,

1. There are some topics relates to GPU access for dma_alloc_coherent buffer before.

The conclusion is that if the buffer is cacheable, it should work with EGL mapping.
However, we haven’t received any feedback from the user about the result.

2.
Pinned memory needs to be a paged-lock host memory.
We don’t have too much experience on the dma_alloc_coherent buffer.
It’s recommended to give it a try directly.

3.
No. The access needs to go through EGL mapping.
You will need to make sure the EGL mapping is working first.

Thanks.

jeremy.pallotta · October 27, 2020, 12:02pm

Is there an EGL mapping example available? I have not used the egl API.

cudaHostRegister looked promising but is not supported on devices with compute capability less than 7.2. I have Jetson TX2 which is compute capability 6.2.

dma_alloc_coherent reserves “device” memory on ARM architectures, which is typically defined as bufferable and non-cacheable memory.

AastaLLL · October 28, 2020, 3:23am

Hi,

You can find some EGL mapping information and example here:

The issue of cudaHostRegister is a hardware limitation.
Jetson starts to support IO coherency from sm_72, which limited the cudaHostRegister usage.

Thanks.

Topic		Replies	Views
Can memory allocated by dma_alloc_coherent be accessed by Cuda/GPU on TX1? Jetson TX1	2	832	October 18, 2021
How to access dma_alloc_coherent memory from CUDA? Jetson AGX Xavier	3	1367	October 18, 2021
How to use memory allocated by dma_alloc_coherent() in cuda Jetson TX2 cuda	6	1116	October 18, 2021
will there be any chance that "dma_alloc_coherent @ tegra-tx2-platform" use cacheable physical memory ? Jetson TX2	3	1000	October 11, 2022
Xavier AGX PCIe End-Point : access to dma_alloc_coherent return in CUDA kernel Jetson AGX Xavier pcie	3	1153	October 18, 2021
GPU direct access to DMA memory over PCIe Jetson Xavier NX pcie , cuda	4	2460	April 22, 2022
How to register a dmabuffer fd space to CUDA Device？ Jetson AGX Xavier cuda	2	602	February 9, 2022
DMA to/from GPU memory Jetson Orin Nano cuda	5	678	April 18, 2024
GPUdirect RDMA in Jetson Xavier - cudaHostAlloc() Jetson AGX Xavier cuda	2	516	January 5, 2022
Xavier Shared CPU iGPU memory DRIVE AGX Xavier General driveos-dl	3	1218	October 12, 2021

GPU access to memory allocated by dma_alloc_coherent

Related topics