Hi,
Currently I’m working on the workflow where the camera frames are streaming out and I will do some CUDA processing on it then display it into screen. The buffer is allocated into DMA buffer using NvBufSurface. According to the sample codes of cuda_postprocess from 12_v4l2_camera_cuda, and Handle_EGLImage from NvCudaProc, I can get the device pointer from EGLImage wrapped with the dmabuff_fd. However, I overserved that while receiving every frame, I need to:
- Get NvBufSurface from fd,
- Map NvBufSurface to EGLImage
- Register the EGLImage with CUDA
- Register resource
- CUDA processing
- Unregister resource
- Unmap the EGLImage
I have to repeat those steps for every frame. I was wondering whether those steps can be reusable or not? For example, I allocate the dma buffer, and mapped into EGL image and register the corresponding resources beforehand, and store them as member variables. During the processing, I just reuse the pre-allocated/registered variables, and do the processing. Upon deconstructor, I deallocate, unregister or unmap those resources. I guess it would be saving a lot of latency. I have tested there is no issue at all, no memory leak, no frame artefacts, but I didn’t find any official reference of such kind of usage, so I was wondering whether there is any potential issue with those operations? Should I avoid it? Is there any method to avoid those repeated steps?
Thanks in advance!
Best,
Wenhai.