From the cudaBayerDemosaic sample, it seems the sample code creates GPU device memory buffers. Then it uses these buffers to hold the output of the CUDA kernel and then sends to EGL display. From the output of the EGLStream, it seems that the output of the stream can be mapped and directly used by the CUDA kernel. Am I correct?
But for the NativeBuffer, it seems the default buffer type is NVBUF_MEM_SURFACE_ARRAY, which is not CUDA device memory, CUDA pinned memory and CUDA unified memory. Can both CPU and GPU directly access this type of memory? (Maybe through some kind of mapping?)
Also, is it possible that I allocate some unified memory myself and convert it into “NvBufSurface” type and use it in a BufferStream?
The application allocates V4L2 user-space buffers (V4L2_MEMORY_USERPTR). In this situation, the driver directly fills in the user-space memory. When you allocate CUDA-mappable memory (with cudaHostAlloc), the CUDA device can access the V4L2 captured buffer without memory copy.
So you should be able to preallocate a buffer (unified if you want) and feed it into the v4l2 pipeline.