In our Jetson application, we take images from MIPI camera (through the Argus library), encode them using NVENC and process them using CUDA.
Argus provides samples where we can get a CUDA image from EGLStreams or a DMAbuffer (NvBuf), but I think not both.
When we try to copy from NvBuf to CUDA, that results in a system Memcpy (since the cuda allocation is cuMemAlloc (on the device), and very slow.
We have looked into doing NvEglImageFromFd to easily copy Dmabuffer to CUDA, but since our Jetson is in headless mode, that function fails.
Is there a recommended way to copy between CUDA and DMAbuffer that does not involve CPU memcpy?
Or a way to map a dma fd into cuda as a resource and then use cuMemCpy ?