Issue with m_bForce_zero_latency(force_zero_latency) option NvDecode.cpp

Video-Codec-sdk version: 11.1.5
For normal IPPP input files I see there is always a delay of 4 frame before the first frame is returned. I see a option of force_zero_latency that can be enabled for zero latency output.

The problem I face is the timestamp values doesn’t seem to be populated when this option is enabled. Can someone help me with this. Some modification to NvDecode.cpp file is required for getting proper pts. The option pDispInfo->timestamp doesn’t seem to give 0 if we use the zero_latency option

When force_zero_latency is enabled the Usual decode process is bypassed and the method HandlePictureDecode
create and fill the CUVIDPARSERDISPINFO (pDispInfo) and then call the HandlePictureDisplay(&pDispInfo) itself.
the HandlePictureDecode does not fill the timestamp value in the pDispInfo - so you will always get a timestamp of 0.

You can store the value of the timestamp the decoder recieve in the decode method in some global variable
and pass this value in the HandlePictureDecode when it fill the pDisplayInfo. this shold solve your problem
since the force_zero_latency flag force the decoder to decode the latest frame(the last one that was passed using the decode method) as soon as it has a full frame

1 Like

Hi @zivlutwak1 ,
This solution won’t work for formats where future frames are decoded first(IBPP). Is there some param etc which can be used to map the decode frame to timestamp(or some way to have display order)? Also how does the parser map the frame to the timestamp and how does it get the display order?

Thanks

Low latency will not work for B frames anyway.
It will only work for video with I or P frames.
It specifically written in the documentation of the SDK that using low latency flags(there are 2)
With B frames Will result in unexpected behavior
This is because B frames cannot be decoded until you decode both the I frames before it and after it.

קבל ‏Outlook עבור Android‏

1 Like

Hi @zivlutwak1,
I had checked the documentation. It does point that the low latency mode won’t work for B frames. But the documentation suggest for best performance(low latency and efficiency) to have separate thread for HandleDisplayPicture. And to use a shared queue between PictureDecode and picturedisplay. But the problem is there is no timestamp available in PictureDecode nor there is some kind of display order that I can use so that I can store the timestamp globally and map it later.

From looking at the document it feels like - For IPPP simple solution is to use low latency mode and to store the timestamp. For IBPP is then to use normal mode.

Still the reason behind the intial frame gap(first frame being I frame) is not clear. It seems to be connected with min_num_decode_surface.

Thanks.

Hi @duttaneil16 .
As you can see in the code There are 2 flags for low latency. the original flags is used in the constructor
of the NvDecoder - if its value is true - ulMaxDisplayDelay value is set to 0, otherwise it is set to 1 - interstingly
if you check the definition for the CUVIDPARSERPARAMS struct you see that the recomended value for ulMaxDisplayDelay is between 2 and 4 and that this parameter is for “improves pipelining of decode with display”.
As i understand it using low latency and setting this parameter to 0 reduce the latency but also reduce the efficiency of the decoder - if you try to decode the same video (with NO B frames) with or without the low latency flag you will see the GPU% is different and is much lower when low latency is enabled.

the 2nd low latency flag (force_low_latency) actually bypass the normal behavior of the decoder.
usually the decoder will call the decodePicture callback when the video frame is ready to be decoded (decode order) and the display picture callback when the frame is ready to be displayed (display order).
In Video with No B frames decode order and display order is the same. When B frames are used they are not the same. You can see the decoder holds an array called m_nPicNumInDecodeOrder which is filled in the HandlePictureDecode callback.

As for the timestamp I can only assume that the parser saves it internally for each of the packets it receive
when calling cuvidParseVideoData in the decode method and then when it call the display picture callback it take the timestamp that belong to the frame and insert it to the CUVIDPARSERDISPINFO struct it pass to the callback. - but when force_zero_latency is enabled the normal behavior is bypassed and the HandlePicture Display is called directly from the HandlePicture decode method and the CUVIDPARSERDISPINFO struct is filled there - so we also need to set the timestamp there…

As for using different threads for the decoding and the display - according to the documentation
in the HandlePicture Display callback instead of doing what the NvDecoder does not you should insert the picture index (or the entire dispInfo struct) to a queue. and in another thread this queue is monitored and when it is not empty this thread will dequeue it and process the decoded method (what HandlePictureDisplay does in the current NvDecoder).
You should read NVDEC Video Decoder API Programming Guide :: NVIDIA Video Codec SDK Documentation
for more information - this will also work when Force_Zero_Latency is set to true

1 Like

Hello, would you be so kind and tell me what is the second flag (force_low_latency)? I am dealing with similar issue and am aware of the ulMaxDisplayDelay but I can’t find the second one that you are all talking about here. I must be blind or something, looked here and here.
I managed to achieve one frame lag when not waiting for display callback and fetching the frames from the decode one in the parser but I’d love to get rid of that too.

hello,
the flag force_low_latency is new - it was added to the SDK on version 11.1.5 (July 2021).
This is why both the links you mention do not contain any information about it (they are much older -2018).
if you download the latest version of the SDK you will be able to see this flag in the constructor of the NvDecoder
class (NvDecoder.cpp and NvDecoder.h) that are used in the SDK.
But i must tell you that even if you will use this second flag in addition to the first one it will not get rid of the one frame lag you have now - the reason is that this lag is not because of latency but instead it is because the parser
is not sure that you gave it a full frame, so it waits for the next input (next call for the decode method) and when it detects that the next input start a new frame it sends the previous frame for decoding…
The good news is that there is a way to get rid of this one frame lag and that is to add the value
CUVID_PKT_ENDOFPICTURE to the flags parameter in the decode method. this flag tells the parser that the input data contain exactly one frame or one field so the parser knows it does not need to wait for the next input and send the data to the decoder immediately (of course you must ensure that the input data is correct - the ffmpeg demuxer that is used in the demo application works correctly and each packet it deliver to you is a single frame) - So if you add this flag to each call to the decode method it should remove the one frame lag and you will see the HandlePictureDecode and HandlePictureDisplay are called for each decode call (starting from the first key frame of course…).

1 Like

Many thanks!!! You have explained the flag and helped me solve the main issue as well! I have heard of CUVID_PKT_ENDOFPICTURE before but I misunderstood its purpose. I thought that it shall be used only for streams with one packet.
The lag is gone now! Thanks again!