Video SDK decoder or encoder have always 5 frames Buffer DPB buffer or some other frame buffer

meRaza · October 4, 2018, 12:06pm

Hi Nvidia ,

I am using Video_Codec_SDK_8.2.16 for decoding and encoding for streaming application with VR processing.

For me each frame delay matters and I am getting always 5 frames delay w.r.t Input frame no and Ouput Frame no.

I have explain with example , setting and log below.

Note :-
a) (Input frame number and system time) "Video Decode PTS "

b) (Output frame number and system time) " After Decode system time "

Example

Decoder Input frame number is 23 but processed output frame number will be 18 (=23-5).
Its constant frame number difference is 5 always.

[1055012124] (msec) After Decode system time [18]
[1055012137] (msec) Video Decode PTS = 00:00:02.320 (2.320) [23]

I modified setting then also same.

videoParserParameters.ulMaxNumDecodeSurfaces = 1;
I changed 20 to 1 , still 5 frame delay .
static unsigned long GetNumDecodeSurfaces(cudaVideoCodec eCodec, unsigned int nWidth, unsigned int nHeight) {

if (eCodec == cudaVideoCodec_H264 )
{ // assume worst-case of 20 decode surfaces for H264
return 1; //retrun 20;
}

User generated logs

Initialize  Decoder Handler

  start Decoder handler Begin

 DecoderHandler init
[1055010609] (msec) Video Decode PTS = 00:00:01.400 (1.400) [0]
[INFO ][17:11:42] Media format: MPEG-TS (MPEG-2 Transport Stream) (mpegts)
[1055010618] (msec) Video Decode PTS = 00:00:01.440 (1.440) [1]
[1055010621] (msec) Video Decode PTS = 00:00:01.480 (1.480) [2]
[h264 @ 0000022b83873e80] non-existing SPS 0 referenced in buffering period
[h264 @ 0000022b83873e80] SPS unavailable in decode_picture_timing
[h264 @ 0000022b83873e80] non-existing SPS 0 referenced in buffering period
[h264 @ 0000022b83873e80] SPS unavailable in decode_picture_timing
[1055010648] (msec) Video Decode PTS = 00:00:01.520 (1.520) [3]
[1055010663] (msec) Video Decode PTS = 00:00:01.560 (1.560) [4]
[1055010669] (msec) Video Decode PTS = 00:00:01.600 (1.600) [5]
[1055010676] (msec) Video Decode PTS = 00:00:01.640 (1.640) [6]
[1055010682] (msec) Video Decode PTS = 00:00:01.680 (1.680) [7]
[1055010694] (msec) Video Decode PTS = 00:00:01.720 (1.720) [8]
[1055010701] (msec) Video Decode PTS = 00:00:01.760 (1.760) [9]
[1055010701] (msec) Video Decode PTS = 00:00:01.800 (1.800) [10]
[1055010701] (msec) Video Decode PTS = 00:00:01.840 (1.840) [11]
[1055010701] (msec) Video Decode PTS = 00:00:01.880 (1.880) [12]
[1055010702] (msec) Video Decode PTS = 00:00:01.920 (1.920) [13]
[1055010702] (msec) Video Decode PTS = 00:00:01.960 (1.960) [14]
[1055010702] (msec) Video Decode PTS = 00:00:02.0 (2.000) [15]
[1055010703] (msec) Video Decode PTS = 00:00:02.40 (2.040) [16]
[1055010703] (msec) Video Decode PTS = 00:00:02.80 (2.080) [17]
[1055010703] (msec) Video Decode PTS = 00:00:02.120 (2.120) [18]
[1055010703] (msec) Video Decode PTS = 00:00:02.160 (2.160) [19]
[1055010705] (msec) Video Decode PTS = 00:00:02.200 (2.200) [20]
[1055010707] (msec) Video Decode PTS = 00:00:02.240 (2.240) [21]
[1055010708] (msec) Video Decode PTS = 00:00:02.280 (2.280) [22]

 Decoder Init called beging
GPU in use: GeForce GTX 1070
Decode with demuxing.

 InitHardwareVideoDec w=2000 h=1000

 in_channel_layout 3 ,in_sample_rate=48000 , in_sample_fmt=8,out_channel_layout=3,out_sample_rate=44100,out_sample_fmt=1

  start Decoder handler End

  start Decoder handler Begin
Session Initialization Time: 40 ms

 [1055011621] (msec) After Decode system time [0]
 frame no = 0
 [1055011651] (msec) After Decode system time [1]
 frame no = 1
 [1055011669] (msec) After Decode system time [2]
 frame no = 2
 [1055011712] (msec) After Decode system time [3]
 frame no = 3
 [1055011736] (msec) After Decode system time [4]
 frame no = 4
 [1055011776] (msec) After Decode system time [5]
 frame no = 5
 [1055011790] (msec) After Decode system time [6]
 frame no = 6
 [1055011820] (msec) After Decode system time [7]
 frame no = 7
 [1055011852] (msec) After Decode system time [8]
 frame no = 8
 [1055011873] (msec) After Decode system time [9]
 frame no = 9
 [1055011903] (msec) After Decode system time [10]
 frame no = 10
 [1055011940] (msec) After Decode system time [11]
 frame no = 11
 [1055011957] (msec) After Decode system time [12]
 frame no = 12
 [1055011983] (msec) After Decode system time [13]
 frame no = 13
 [1055012001] (msec) After Decode system time [14]
 frame no = 14
 [1055012044] (msec) After Decode system time [15]
 frame no = 15
 [1055012072] (msec) After Decode system time [16]
 frame no = 16
 [1055012108] (msec) After Decode system time [17]
 frame no = 17
 [1055012124] (msec) After Decode system time [18]
[1055012137] (msec) Video Decode PTS = 00:00:02.320 (2.320) [23]

 [1055012152] (msec) After Decode system time [19]
[1055012174] (msec) Video Decode PTS = 00:00:02.360 (2.360) [24]

 [1055012186] (msec) After Decode system time [20]
[1055012201] (msec) Video Decode PTS = 00:00:02.400 (2.400) [25]

 [1055012214] (msec) After Decode system time [21]
[1055012236] (msec) Video Decode PTS = 00:00:02.440 (2.440) [26]

 [1055012250] (msec) After Decode system time [22]
[1055012288] (msec) Video Decode PTS = 00:00:02.480 (2.480) [27]

rypark · October 5, 2018, 6:45pm

Hi meRaza,

Does your bitstream contain B frames? Do you see any difference in behavior where the bitstream has an IPP GOP structure?
Please try setting CUVIDPARSERPARAMS::ulMaxDisplayDelay = 0.
In case the packet sent to parser has exactly 1 frame of bitstream data, then you can set CUVID_PKT_ENDOFPICTURE.
However, the latency you will encounter will be the at least the number of B frames you have in your bitstream.

Thanks,
Ryan Park

meRaza · October 8, 2018, 4:37am

Hi rypark,
Thanks lot for info.
Now the delay is reduced to 3 frames. after adding two settings.(decoder side and encoder side i need to check yet)
My Input bitstream doesn’t has B frame, and yes, my input is coming in packets of size 1316.

videoParserParameters.ulMaxDisplayDelay = bLowLatency ? 0 : 1; —> ADDED
setting CUVID_PKT_ENDOFPICTURE. —> ADDED

As I know IPPP… stream, max delay will be one frame. As P-frame need to predict from previous P or I frame.
Can you help me bring down to 1 frame delay ?, currently its 3 frame delay.

DEBUG INFO

[1376706336] (msec) After Decode system time [253]
[1376706394] (msec) Video Decode PTS = 00:00:11.640 (11.640) [256]

 [1376706407] (msec) After Decode system time [254]
[1376706455] (msec) Video Decode PTS = 00:00:11.680 (11.680) [257]

 [1376706471] (msec) After Decode system time [255]
[1376706515] (msec) Video Decode PTS = 00:00:11.720 (11.720) [258]

 [1376706527] (msec) After Decode system time [256]
[1376706604] (msec) Video Decode PTS = 00:00:11.760 (11.760) [259]

 [1376706613] (msec) After Decode system time [257]
[1376706669] (msec) Video Decode PTS = 00:00:11.800 (11.800) [260]

 [1376706679] (msec) After Decode system time [258]
[1376706732] (msec) Video Decode PTS = 00:00:11.840 (11.840) [261]

 [1376706743] (msec) After Decode system time [259]
[1376706805] (msec) VDecode Error occurred for picture 260 error 9
ideo Decode PTS = 00:00:11.880 (11.880) [262]

 [1376706821] (msec) After Decode system time [260]
[1376706896] (msec) Video Decode PTS = 00:00:11.920 (11.920) [263]

 [1376706915] (msec) After Decode system time [261]
[1376706981] (msec) Video Decode PTS = 00:00:11.960 (11.960) [264]

 [1376706994] (msec) After Decode system time [262]
[1376707033] (msec) Video Decode PTS = 00:00:12.0 (12.000) [265]

 [1376707044] (msec) After Decode system time [263]
[1376707116] (msec) Video Decode PTS = 00:00:12.40 (12.040) [266]

 [1376707126] (msec) After Decode system time [264]
[1376707178] (msec) Video Decode PTS = 00:00:12.80 (12.080) [267]

Note: I am creating class object and calling decode function like this

setting latency true 7th argument.

m_pDec = new NvDecoder(cuContext, m_pFFmpegDemuxer->GetWidth(), m_pFFmpegDemuxer->GetHeight(), false,
		FFmpeg2NvCodecId(m_pFFmpegDemuxer->GetVideoCodec()), NULL, true, false, NULL, NULL);

m_pDec->Decode(m_pstDecVideo->pVideo, m_pstDecVideo->nVideoBytes, &m_pstDecVideo->ppFrame, &m_pstDecVideo->nFrameReturned,
		CUVID_PKT_ENDOFPICTURE);

Questions:

what is the use of CUstream stream in below function ? and note, I am not using last 3 function arguments (ppTimestamp,timestamp,stream)

bool NvDecoder::Decode(const uint8_t *pData, int nSize, uint8_t ***pppFrame, int *pnFrameReturned,
uint32_t flags, int64_t **ppTimestamp, int64_t timestamp, CUstream stream)

I am reading file and giving 1316 packets to demuxer → decoder(to stimulate input as streaming packets) and there will be no loss of packets,

But I am getting error “Decode Error occurred for picture 260 error 9”

rypark · October 23, 2018, 9:05pm

Hi,

CUStream was added with a purpose. By default all the Cuda kernels used inside the NVDECODEPAI run on NULL stream, however the user can create a stream and by specifying the CUSTREAM tell the NVDECODEPAI to run the internal Cuda kernels on that CUSTREAM. CUStream object ensures better pipelining in case you are running some cuda kernels that doesn't need to serialize with other cuda kernels. Please see cuda documentation of CUStream usage.
Please note, as we mentioned earlier, CUVID_PKT_ENDOFPICTURE should be specified if you have exactly one frame of data. However, if seems you have a fixed packet. Hence, you should not use this flag.
If stream is low latency and doesn't have any B frames, there is no need to wait for display callback. You can get frames directly from decode call back.

Please make below changes in code:

<i>Don’ t register display callback to parser:</i>
videoParserParameters.pfnDisplayPicture = NULL;
[i]get frames directly from decode callback function NvDecoder::HandlePictureDecode(). Insert below lines at end of NvDecoder::HandlePictureDecode(): /i]
   
           CUVIDPARSERDISPINFO dispInfo;
           memset(&dispInfo, 0, sizeof(dispInfo));
           dispInfo.picture_index = pPicParams->CurrPicIdx;
           dispInfo.progressive_frame = !pPicParams->field_pic_flag;
           dispInfo.top_field_first = pPicParams->bottom_field_flag ^ 1;
           HandlePictureDisplay(&dispInfo);

meRaza · October 26, 2018, 10:36am

Thanks for info rypark. and apologies for late reply

Even after adding above fix i have two frame difference.and (if remove flag CUVID_PKT_ENDOFPICTURE then delay is increased by 1 frame, so i am retaining it.)

Note:-

The last number is frame number in braces .
(msec) Video Decode PTS – Entry of new frame
(msec) After Decode system time-- after decoding video frame.

Example:-
frame number 181
(system time is decreasing order )
a) Frame[181] entered in looping readdata() function at system time : xxx527.
b) Frame[181] exits decoder (before file write )after decoding at system time : xxx454
difference is 73ms for 1 frame to pass through Input to output of decoder.

My fps is 24fps 720p input stream , so one frame delay will be 1/24= 40milliseconds but total
difference in time is 73ms which is ~approximately equal to = 2 frame time =~ 80milliseconds.

Can you help me in resolving it.
Thanks in advance

if required ii can share code via email id

[-1344031553] (msec) After Decode system time [178]
[-1344031527] (msec) Video Decode PTS = 01:00:07.541 (3607.542) [181]

 [-1344031519] (msec) After Decode system time [179]
[-1344031484] (msec) Video Decode PTS = 01:00:07.583 (3607.583) [182]

 [-1344031478] (msec) After Decode system time [180]
[-1344031461] (msec) Video Decode PTS = 01:00:07.625 (3607.625) [183]

 [-1344031454] (msec) After Decode system time [181]
[-1344031430] (msec) Video Decode PTS = 01:00:07.666 (3607.667) [184]

 [-1344031422] (msec) After Decode system time [182]
[-1344031398] (msec) Video Decode PTS = 01:00:07.708 (3607.708) [185]

nvidia9dk7k · October 29, 2018, 7:18pm

there are more people wondering about the answer to this question besides just meRaza. I want to know the answer too. I need no latency in my encode/decode pipeline, too.

rypark · October 30, 2018, 8:56pm

Hi meRaza,

Could you send your sample code?

You can email it to video-sdk-feed@nvidia.com

Thanks,
Ryan Park

meRaza · October 31, 2018, 6:38am

Hi Ryan Park.

As discussed, I have shared the google drive link containing the Nvidia Decoder code with test app"TestDecLib.cpp" , Two Input test stream and readme.txt.

I am sending the mail from “mdamirraza@gmail.com”.

meRaza · October 31, 2018, 6:40am

Yeah sure ,if i find some solution i will definitely share… thanks for your interest.

meRaza · October 31, 2018, 8:55am

Hi Ryan Park.

I cant send mail , seems like emaiId has some typo error .
I am sending mail from mdamirraza@gmail.com".

Mail Delivery Subsystem mailer-daemon@googlemail.com

2:02 PM (21 minutes ago)

to me
Error Icon
Address not found
Your message wasn’t delivered to video-sdk-feed@nvidia.com because the address couldn’t be found, or is unable to receive mail.
The response from the remote server was:

550 #5.1.0 Address rejected.

pushistopus · June 24, 2022, 9:51pm

For anybody who is having a similar issue, the solution can be quite simple. The constructor for many, if not all encoders has a default value of 3 for output delay. For example:

NvEncoderD3D11(ID3D11Device* pD3D11Device, uint32_t nWidth, uint32_t nHeight, NV_ENC_BUFFER_FORMAT eBufferFormat, 
        uint32_t nExtraOutputDelay = 3, bool bMotionEstimationOnly = false,  bool bOPInVideoMemory = false);

NvEncoderCuda(CUcontext cuContext, uint32_t nWidth, uint32_t nHeight, NV_ENC_BUFFER_FORMAT eBufferFormat,
        uint32_t nExtraOutputDelay = 3, bool bMotionEstimationOnly = false, bool bOPInVideoMemory = false);

When constructing the encoder try setting the nExtraOutputDelay to 0.

Another “hacky” way I’ve discovered is calling endEncode function after each encodeFrame call. Make sure to pass the same packet to it. This gets rid of latency but increases encoding time 2-3 times, so be careful.