Decoding problem when feeding CUvideodecoder manually


After integrating NVDEC into our video management system, I noticed that almost always, after creating new CUvideoparser, it skips first frame (intra) without feeding it into decoder. Since this behavior is problematic for our pipeline, I wrote a custom parser for H264 and populated the CUVIDPICPARAMS with its output.
This indeed solved my problem, but the decoded video starts getting artifacts on objects in motion. I compared the contents of CUVIDPICPARAMS filled from my parser and from CUvideoparser and there’s no apparent difference, except maybe picture index and frame index, which I guess have nothing to do with motion prediction decoding.

I desperately need an assistance with this issue (or maybe there’s a way to make CUvideoparser not to skip frames)

Best regards

I create a parser like this and I don’t lose the first frame.

// Create video parser
	memset(&parserInitParams, 0, sizeof(parserInitParams));
	parserInitParams.CodecType = Session.cuCodec;
	if (use_D3D == 1)
		parserInitParams.ulMaxNumDecodeSurfaces = 16;
		parserInitParams.ulMaxNumDecodeSurfaces = MAX_FRM_CNT;
	parserInitParams.ulErrorThreshold = 100;
	parserInitParams.ulMaxDisplayDelay = 4;
	parserInitParams.pUserData = &Session;
        Session.This_p = (void *) this;
	parserInitParams.pfnSequenceCallback = HandleVideoSequence;
	parserInitParams.pfnDecodePicture = HandlePictureDecode;
	parserInitParams.pfnDisplayPicture = HandlePictureDisplay;
	result = cuvidCreateVideoParser(&Session.cuParser, &parserInitParams);

I inject data by NALUs. Each NALU goes like this, where buf points to the NALU and len is its length:

pkt.flags = 0;
    pkt.payload_size = (unsigned long) len;
    pkt.payload = buf;
    pkt.timestamp = 0;  // not using timestamps
    cuvidParseVideoData(Session.cuParser, &pkt);

You should make sure you are injecting correct data, including SPS/PPS.

Well, my code is almost a clone of yours except for I use a different number for decoding surfaces. Tried to play with this param as well, but still the same. It swallows the first I frame without invoking any callback and after following P frame is submitted, it starts spitting out data for the decoder (starting obviously with I frame, submitted in past iteration). Of course, the I frame buffer includes SPS/PPS as well. Very weird behavior, for which I cannot get any clarification from NVIDIA guys.

That’s not weird at all. It’s standard re-ordering behavior.

But now you’re confusing me because you initially said the parser was losing the first frame. But now you are saying that you don’t get the I frame until you have submitted the P frame. That is normal.

Oh well, my bad - incorrect wording. I wonder then what’s the purpose of this behavior and is there any way to change it?

It’s a standard process to re-order frames for display. Consider the encode order:

I2 B0 B1 P5 B3 B4 P8 B6 B7 …

where the numbers are the display numbers, i.e., 0 is displayed first, etc.

The rule is that a decoded B frame is displayed immediately and the Is and Ps are delayed, i.e., decoding an I/P outputs the previous I/P. Then the decoder thinks like this:

decode I2, don’t output anything
decode B0, output it right away
decode B1, output it right away
decode P5, output I2
decode B3, output it right away
decode B4, output it right away
decode P8, output P5

Thereby the display order is achieved. Similar things happen for IPBBPBBPBB. The initial I won’t be output until the following P is decoded.

You can’t change this behavior but you don’t need to. Pace things on the output side, rather than the input side. I could explain that better if I knew exactly what you are trying to do.

Yes, it definitely makes sense in case where B frames are present, however the nature of security cameras is that they do not use B frames, at least by default. It’s usually a sequential IPPPIPPPI… stream, which makes me wonder why it still delays the first intra.

The decoder does not know that it will not encounter B frames.

Given that the decode rate is typically hundreds of frames per second, how would a 1-frame latency adversely affect your application?

Of course nothing wrong, in case the pipeline design is robust enough. I wish I had that luxury, but unfortunately we don’t always inherit the perfect codebase, eh? The whole pipiline I’m working with expects the same frame, that was submitted, to show up at the end, so the fact that the decoder introduces a delay drives the whole system crazy. Thanks for the help anyways!

deleted – gotta be careful where we cast our pearls



Competent engineers have a COURTESY not to insult each other without knowing all the details, which you obviously do not. Again, I appreciate your help, but your last comment was idiotic in its arrogance.