cuGraphicsEGLRegisterImage fails on NvBufferColorFormat_NV12_709 frames

Hello!

I’m currently working on a PoC for bridging Jetson Nano hardware decoder and further zero-copy processing in CUDA. There are samples that illustrate how to achive this (for example, /usr/src/jetson_multimedia_api/samples/02_video_dec_cuda). In short, each decoded frame goes into an EGL Image via DMA fd (NvEGLImageFromFd), then the EGL image is registered as a CUDA resource (cuGraphicsEGLRegisterImage) and gets mapped into a CUDA frame (cuGraphicsResourceGetMappedEglFrame).

The samples work fine, but when targeting NV12 with BT.709 colorspace, a problem arises. First, NvVideoConverter doesn’t work with BT.709 video feeds. But NvVideoConverter is deprecated with NvBufferTransform (transform api) as the alternative. NvBufferTransform itself works fine, but its output is not accepted by CUDA API when the output colorspace is set to BT.709.

Minimal code to illustrate the problem:

NvBufferCreateParams input_params;
// …
input_params.payloadType = NvBufferPayload_SurfArray;
input_params.colorFormat = NvBufferColorFormat_NV12;
NvBufferCreateEx(&buffer_dma_fd, &input_params);
// NvBufferTransform is writing to the buffer linked to buffer_dma_fd
egl_image = NvEGLImageFromFd(egl_display, buffer_dma_fd);
cuGraphicsEGLRegisterImage(&image_resource, egl_image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
cuGraphicsResourceGetMappedEglFrame(&egl_frame, image_resource, 0, 0);

This code works with NvBufferColorFormat_NV12. But once NvBufferColorFormat_NV12 is replaced with NvBufferColorFormat_NV12_709 cuGraphicsEGLRegisterImage calls fails with error 801: operation not supported. This error is interesting since hardware decoder/converter works fine with BT.709 and the error happens only in CUDA register call – where it shouldn’t have mattered which colorspace was used originally (the memory layout is identical to the working NV12/BT.601).

The complete code to more easily reproduce the issue is available at GitHub - sergeev917/jetson-nano-hwdec-bt709-repro The code is mostly a stripped-down version of the official samples. The most relevant file is sources/App.cpp, other files are helpers from the official samples.

So, the question is why does this code fail for BT709 colorspace and only later in CUDA API? Is it possible to work around this issue (the goal is to get nv12 bt709 cuda frame from nvdec)?

Hi,
We can observe the issue. Will investigate and update.

Hi,
For imformation, please share your release version( $ head -1 /etc/nv_tegra_release )

Hi,

$ head -1 /etc/nv_tegra_release
R32 (release), REVISION: 4.3, GCID: 21589087, BOARD: t210ref, EABI: aarch64, DATE: Fri Jun 26 04:38:25 UTC 2020

Hi,
Please replace the attachment and give it a try.

/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1

r32_43_TEST_libcuda.so.1.1.zip (3.6 MB)

Hi,

Please replace the attachment and give it a try.

this version of the cuda library works fine. This solves the problem for all current needs. Thank you!

Is there any estimates about how long it would take until the proper release (rough estimates are fine)?

Hi,
The latest release is r32.4.3. The fix will be in coming release r32.5.
On r32.4.3, you can use the attached lib.

Currently the CUDA formats do not distinguish bt601, bt709, bt2020. We are checking this and may have new defined formats in future CUDA releases.