DMABuf CUDA transfer with EGL Streams


In our Jetson application, we take images from MIPI camera (through the Argus library), encode them using NVENC and process them using CUDA.

Argus provides samples where we can get a CUDA image from EGLStreams or a DMAbuffer (NvBuf), but I think not both.
When we try to copy from NvBuf to CUDA, that results in a system Memcpy (since the cuda allocation is cuMemAlloc (on the device), and very slow.

We have looked into doing NvEglImageFromFd to easily copy Dmabuffer to CUDA, but since our Jetson is in headless mode, that function fails.

Is there a recommended way to copy between CUDA and DMAbuffer that does not involve CPU memcpy?
Or a way to map a dma fd into cuda as a resource and then use cuMemCpy ?

Thank you,

Please refer to
Nvarguscamerasrc jetpack-4.3, No cameras available, nvbuf_utils: Could not get EGL display connection, requirements - #41 by WayneWWW

In headless mode you still need to keep certain driver, such as nvgpu. We would suggest check why NvEglImageFromFd() fails in your environment.

I am able to capture with Argus, and using NvBufMapMemory copy into cuda and it all works. I would assume this means nvgpu must be working. There is no xorg started in the jetson setup we have.
I have found a way (not ideal, but seems to work) leveraging host pinned memory, unified memory architecture in Tegra and GPU coherence.

For anyone coming later on this thread, I have no success with NvEglImageFromFd, but on Tegra, for this operation, I allocated the CUDA allocation as Host Pinned, and that allowed NvBuffer2Raw to use the DMA engine to fill the CUDA memory.