I’m currently working on a PoC for bridging Jetson Nano hardware decoder and further zero-copy processing in CUDA. There are samples that illustrate how to achive this (for example, /usr/src/jetson_multimedia_api/samples/02_video_dec_cuda). In short, each decoded frame goes into an EGL Image via DMA fd (NvEGLImageFromFd), then the EGL image is registered as a CUDA resource (cuGraphicsEGLRegisterImage) and gets mapped into a CUDA frame (cuGraphicsResourceGetMappedEglFrame).
The samples work fine, but when targeting NV12 with BT.709 colorspace, a problem arises. First, NvVideoConverter doesn’t work with BT.709 video feeds. But NvVideoConverter is deprecated with NvBufferTransform (transform api) as the alternative. NvBufferTransform itself works fine, but its output is not accepted by CUDA API when the output colorspace is set to BT.709.
Minimal code to illustrate the problem:
input_params.payloadType = NvBufferPayload_SurfArray;
input_params.colorFormat = NvBufferColorFormat_NV12;
// NvBufferTransform is writing to the buffer linked to buffer_dma_fd
egl_image = NvEGLImageFromFd(egl_display, buffer_dma_fd);
cuGraphicsEGLRegisterImage(&image_resource, egl_image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
cuGraphicsResourceGetMappedEglFrame(&egl_frame, image_resource, 0, 0);
This code works with NvBufferColorFormat_NV12. But once NvBufferColorFormat_NV12 is replaced with NvBufferColorFormat_NV12_709 cuGraphicsEGLRegisterImage calls fails with error 801: operation not supported. This error is interesting since hardware decoder/converter works fine with BT.709 and the error happens only in CUDA register call – where it shouldn’t have mattered which colorspace was used originally (the memory layout is identical to the working NV12/BT.601).
The complete code to more easily reproduce the issue is available at https://github.com/sergeev917/jetson-nano-hwdec-bt709-repro The code is mostly a stripped-down version of the official samples. The most relevant file is sources/App.cpp, other files are helpers from the official samples.
So, the question is why does this code fail for BT709 colorspace and only later in CUDA API? Is it possible to work around this issue (the goal is to get nv12 bt709 cuda frame from nvdec)?