The CPU handles demultiplexing and parsing of the received video stream, extracting the bitstream data. This bitstream data is then transferred to the GPU and maintained by the NVIDIA driver in four buffers for subsequent decoding. These buffers store the frames awaiting decoding, and the decoder reads data from these buffers for the decoding process then output decoded frame to decoded surface. am i right?
Hi @user141873,
What you are describing is more or less correct. I am not sure from where you get the
four buffers for subsequent decoding
The number of surfaces that NVDEC allocates depends on how the application wants to balance throughput against GPU memory load. Essentially these two parameters define the number of surfaces:
CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces
CUVIDDECODECREATEINFO::ulNumDecodeSurfaces
This is nicely described in the latest Programming Guide for the Video Codec SDK.
I hope that helps!
Thanks @MarkusHoHo , that helps a lot.
I get the information “four buffers for subsequent decoding” from 4.3 Writing an Application Efficient Decode in NVDEC Video Decoder API Programming Guide. The following is the paragraph:
The NVDEC driver internally maintains a queue of 4 frames for efficient pipelining of operations. Please note that this pipeline does not imply any decoding delay for decoding. The decoding starts as soon as the first frame is queued, but the application can continue queuing up input frames so long as space is available without stalling. Typically, by the time application has queued 2-3 frames, decoding of the first frame is complete and the pipeline continues. This pipeline ensures that the hardware decoder is utilized to the maximum extent possible.
Sorry, I may have described it inaccurately. To be more precise, there is a queue of 4 frames to store undecoded frames(i.e. bitstream transferred to the GPU) , different from docode surfaces to store decoded frames.
Right, that is correct.