NvBufSurface confusion

Using Jetson AGX Xavier, DS 5.0.1.

I am extremely confused about the whole NvBufSurface layout. The documentation / header files are not very helpful for me.

Is there a high level overview somewhere for this? I see various fragments on the forum but never an explanation.

In my particular case, I am attached to the nvv4l2decoder src pad.

v4l2src ! nvv4l2decoder mjpeg=1 ! nvstreammux ! …

The input is 4096x2160 from CU135, format is UYVY.

In the src pad callback, from my prints, I get:
** width 4096 height 2160 pitch 4096 colorFormat 2 num_planes 3
** plane 0 w 4096 ht 2160 p 4096 o 0 psz 8912896 bpp 1
** plane 1 w 2048 ht 1080 p 2048 o 8912896 psz 2228224 bpp 1
** plane 2 w 2048 ht 1080 p 2048 o 11141120 psz 2228224 bpp 1

  1. nvv4l2decoder output will be NV12 (YUV420) right? That is what above colorFormat says.
  2. Why is plane 0 psize=8912896? 4096*2160 is 8847360 - that is 64K too large.
  3. Same with plane 1 & 2 , too large
  4. My first attempt was to simply perform a cudaMemcpy but the total dataSize does not make sense.
  5. If I want to extract only the Y plane, what is the process?
  6. I really do not need to convert to RGB since I just need the luma
  7. If I want to feed this GPU buffer to a custom cuda kernel, what is the process? How do I make an in GPU memory copy that doesn’t impact DS processing? I just want to perform some histogram calculation, not modify the pipeline.

Please refer to https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvvideo4linux2.html

nvv4l2decoder is HW accelerated decoder, so the buffer it uses is special.
colorFormat 2 means NVBUF_COLOR_FORMAT_YUV420 format, it is not ordinary YUV420 format, but the Nvidia HW adapted format.

For your example:
plana 0 size 8912896 = 4096 x 2176 because the width and height for HW should be the multiple of 16.
plana 1 and 2 size 2228224 = 2048 x 1088 also obey the rules of multiple of 16.

For CUDA gpu copy, please refer to How to use NvBufSurfaceCopy to copy surface from CUDA device to CPU accessable memory

That link says ‘The plugin accepts an encoded bitstream and uses the NVDEC hardware engine to decode the bitstream. The decoded output is in NV12 format.’ Where does is explain the format and padding? i cannot find it in the link you provided. Thanks.

It is NV private format. There is no description for the format and padding. You can only refer to the nvbufsurface.h file in the SDK package.

So how do I copy/extract the luma Y plane data?

And your second link was about copying from GPU to CPU. I would like to keep memory in GPU (make a copy) and process via cuda routines.

You’ve already got the whole buffer. The address, pitch, width, height,… It can be processed by CUDA.

The sample of Deepstream sample code snippet is also a sample for both CPU and GPU.

Maybe you could try using CUDA_UNIFIED_MEMORY type? So there’s no need to copy things between CPU and GPU.

1 Like