My system:
- Jetson Xavier NX
- Jetson Linux 35.2.1
For my application I have some frames on cuda memory which I want to encode them. Using the MMAPI video encoding sample I could encode my video correctly.
However according to the sample, before enqueuing the frames using v4l2_buf, I have to transfer the frames to NvBuffer* array, which is on CPU, so technically I need a slow copy between GPU and CPU.
I also checked the possibility to use NvBufSurface directly (without NvBuffer) but that also seems a CPU-accessible structure (NvBufSurfaceMappedAddr is mentioned as “holds planewise pointers to a CPU mapped buffer”).
Now my question is how should I transfer my data to v4l2 without touching CPU memory and rely only on DeviceToDevice copy? My data is a yuv image kept in 3 different uchar* arrays.
In summary this is the MMAPI sample workflow:
file_read(cpu) → NvBuffer(cpu) → NvBufSurfaceSyncForDevice (cpu->gpu) → v4l2_buf (gpu)
But my initial data is on GPU memory, so how should I change the above pipeline?
Thank you.