Video decoder frames latency between first frame inserted and first frame extracted

Hello everybody!

I’m new to this forum so I hope I’ve submitted my questions into the right place (if not please let me know wich is the correct section).

I use HW decoders to decode both h264 and MJPEG streams coming from video surveillance cameras.
My example “acquisition pipe” works the same as the “cudaDecodeD3D9” example, which comes along with the SDK (tipically in C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\3_Imaging\cudaDecodeD3D9), except for the VideoSource object. So we correctly use the VideoParser and the VideoDecoder.

Everything is working perfectly with h264 intra and not intra frame streams (gop > 1) and with MJPEG streams.

The only thing I’ve noticed is that it takes some frames to receive the first HandlePictureDisplay callback even if my stream gop is equal to 1 (MJPEG or H264 intra). The first frame passed to the parsed and to the decoder is always a key-frame or I-frame even ig gop > 1.

For the first frame passed to the parser and to the decoder I set CUvideopacket = CUVID_PKT_TIMESTAMP | CUVID_PKT_DISCONTINUITY, for the further frames to CUVID_PKT_TIMESTAMP, and for the last frame to CUVID_PKT_TIMESTAMP | CUVID_PKT_ENDOFSTREAM. In this scenario I was not able to get the first HandlePictureDisplay before submitting other frames to the decoder.

I also tried to set the CUVID_PKT_ENDOFSTREAM flag to the first frame and the first frame comes out before submitting the next one but it only works in some circumstances (sorry but I was not able to identify them).

So, I assumed that an entire GOP has to be decoded before receiving the first HandlePictureDisplay.

  1. is it correct to assume that?
  2. Are all NVidia HW decoders working with this behavior?
  3. Is there any way to let the first frame comes out (HandlePictureDisplay) before submitting the second frame?

Thank you in advance, I hope someone can enlighten me on this subject.


1 Like

I have the same “problem”. I have live streams as well. I have to put in 5 frames before I get the first out. Is there a way to lower this latency?

I actually use ffmpeg for decoding, it has NVDECODE integrated. If i use the ffmpeg software h.264 decoder, it return the first frame right after I pass the first frame. If I use h264_cuvid, it takes 4 extra frames to pass before it returns the first frame.

For testing I added a one second sleep between pass a frame (avcodec_send_packet(…)) and getting a decoded frame (frameFinished = avcodec_receive_frame(…)). The results are the same.

Thanks in advance.

Internally, the hardware is pipelined, and yields maximum performance when handling multiple frames at different stages of the decoding process. I expect that the hardware requires the pipeline to be filled before it will start returning anything, and that this is independent of GOP or I-frames.

I would expect that setting ENDOFSTREAM would force it to decode the single frame (I don’t know what problems you saw, but they should be separate and solvable), but I’d also expect the performance to be bad - worse than realtime bad? I don’t know, but noticeable, to be sure.

Is there somebody from NVIDIA that can give an defintive answer on this? It will be much appreciated. Thanks in advance.

I have the same problem with SDK-9.0.22 that the first 5 frames are buffered. Is there somebody than can explain this?

Thanks in advance.


We would suggest to follow below steps to reduce latency during decoding process:

  1. In case you know apriori(and sure of it) that your application has exactly 1 frame of data, you should set CUvideopacketflags::CUVID_PKT_ENDOFPICTURE in cuvidParseVideoData(). This flag signals the underlying driver to start decode immediately.
  2. Can you try setting ulMaxDisplayDelay = 0?
  3. For the streams you are evaluating is “num_reorder_frames” in VUI set to zero? These are some syntax elements which can force the parser to introduce latency.

We are hoping that #1 and #2 should solve the problem. If it doesn’t, please share the bit-stream with us. For H264, will you be using I frame only stream?


Hello Mandar!

  1. I tried both setting CUvideopacketflags::CUVID_PKT_ENDOFPICTURE in cuvidParseVideoData() and setting ulMaxDisplayDelay to 0, there is still a 5-frame latency as before.
  2. I’m not familiar with “num_reorder_frames”, but there is no latency with CPU decoding.

By the way, if I set CUvideopacketflags::CUVID_PKT_ENDOFSTREAM in cuvidParseVideoData(), the decoder output immediately. But it only works with I frame.

Can you share the failing stream with us to analyze? And, also help me understand what use case this is.


Hi mandar,

Sorry for the late reply. Actually, I’m working with a live stream through RTP from an IP camera. I record a video from it and below is ther URL.


Hi mandar_godse. Do you have any update on this topic? I have a single frame H264. and decode this frame. setting like your suggest but need to push other data second data (even that NULL data, size 0) to get the first decoded data.