Bad H264 Image Decoded and Controlled by ulNumDecodeSurfaces

Hi,

I’m writing a decoding application based on Video Codec SDK_8.2.15 with CUDA 9.0, GPU is Geforce 1060 3G, driver version is 398.36 .

I found that if I change the ulNumDecodeSurfaces to 2 or 1 for the decoder, the decoded frame looks incorrect, just as the last frame didn’t clear but next frame wrote directly to the only buffer.

(Only displayed Y channel)

If I change the ulNumDecodeSurfaces to a big number such as 20, the image looks correct.

The code to initialize the decoder is here.

// Create the decoder.
    //
    CUVIDDECODECREATEINFO decode_create_info;
    memset(&decode_create_info, 0, sizeof(decode_create_info));

    decode_create_info.CodecType = cudaVideoCodec_H264;
    decode_create_info.ChromaFormat = cudaVideoChromaFormat_420;
    decode_create_info.OutputFormat = cudaVideoSurfaceFormat_NV12;
    decode_create_info.bitDepthMinus8 = 0;
    decode_create_info.DeinterlaceMode = cudaVideoDeinterlaceMode_Weave;
    decode_create_info.ulNumOutputSurfaces = 2;

    decode_create_info.ulCreationFlags = cudaVideoCreate_PreferCUVID;
    decode_create_info.ulNumDecodeSurfaces = NUM_DECODE_SURFACES; // 2 or 20 or the other number ?
    decode_create_info.vidLock = m_video_ctx_lock;
    decode_create_info.ulWidth = m_image_width;
    decode_create_info.ulHeight = m_image_height;
    decode_create_info.ulMaxWidth = m_image_width;
    decode_create_info.ulMaxHeight = m_image_height;
    decode_create_info.ulTargetWidth = m_image_width;
    decode_create_info.ulTargetHeight = m_image_height;

    cuCtxPushCurrent(m_cu_ctx);
    r = cuvidCreateDecoder(&m_video_decoder, &decode_create_info);
    cuCtxPopCurrent(NULL);

Is there anybody got this issue ? Thank you very much !

Hi zhoub,

Please refer to section 4.2 in NVDEC_VideoDecoder_API_ProgGuide.pdf which explains about the definition of ulNumDecodeSurfaces. ulNumDecodeSurfaces should be greater than or equal to the DPB size used in the bitstream. If you are specifying a value less than this, you could encounter corruption. It is recommended that you allocate some more surfaces than the DPB size for better pipelining.

This is what is done in our sample application: please refer to Samples\NvCodec\NvDecoder\NvDecoder.cpp and look for function GetNumDecodeSurfaces(…). For H264 we have hardcoded the value to maximum possible number of reference buffers + 4 which comes out to be 20. We are allocating extra surfaces for better pipelining.

Thanks,
Ryan Park

Hi Ryan,

Very appreciated for your quick response.

I’d like to ask if this parameter affect the latency of decoding, I would say it’s ignorable right ? GPU could work on decoding which might be much faster than mapping and displaying, so it’s best to create more decoding buffers from ulNumDecodeSurfaces in order to let GPU work as much as possible.

Sorry that I just want to reduce latency as much as possible, thank you very much !

Hi zhoub,

The parameter won’t affect the latency of decoding.

Thanks,
Ryan Park