Video Codec SDK: Decoding problem

I’m trying to decode 4k, 25fps, h264 streams coming from different ip cameras (via RTSP) leveraging on NVIDIA Video Codec

With the following SW/HW technical specifications:

  • Ubuntu 18.04 LTS
  • FFmpeg version 3.4.8-0ubuntu0.2
  • GStreamer version 1.16.1
  • Cuda compilation tools, release 10.0, V10.0.130
  • NVIDIA Video Codec SDK 8.2.16
  • NVIDIA Driver Version: 460.32.03
  • NVIDIA GeForce GTX 1060 6GB
  • BOSCH MIC IP 7100i camera
  • AXIS Q6128-E camera

I decided to try the samples that come with the official NVIDIA Video Codec SDK 8.2.16.

The first problem:

In particular I have tried to execute the AppDec example by using this video as an input.
The video has been recorded from the Bosch camera using Gstreamer (I tried also using FFmpeg and there are no differences in results).
After a minimal modification to the AppDec in order to display the time interval between the decoding of each pair of consecutive frames, I noticed a periodical unexpected pattern that repeats each 10 frames (note that 10 is the distance between two keyframes).

However, if I instead use this other video as an input, which has been recorded from the Axis, the execution produce always the same time intervals between two consecutive decoding as we expect.

The second problem:

Another problem rises when I try to use the rtsp stream as an input. In this case the behavior is wrong for both Bosch and Axis camera.
For the Bosch camera, I get the same unexpected pattern, but with new different time intervals because the camera streams at 25 fps.
For the Axis camera, the callback NvDecoder::HandlePictureDisplay(CUVIDPARSERDISPINFO *pDispInfo) has never been called. As a result I have always 0 decoded frames.


  • The time intervals (in milliseconds) among consecutive decoding are shown in the attachment.
  • I also tried using the last version of SDK (NVIDIA Video Codec SDK 11) with CUDA 11.2, but the results do not change and I have the same problems.
  • Actually I’m developing on a GTX device, but I’m planning to deploy my application on a Lenovo Server: ThinkSystem SR650 powered by x2 NVIDIA Quadro RTX 4000 8GB PCIe Active GP. Thus it is very important to solve this problem first.

How could I manage these problems? Could be a bug of the Video Codec SDK?

In order to reproduce these issues, you can find the original code of the AppDec example with the only modification to print the time interval between two consecutive decoding frames here.


Hello @sgira

My name is Roman, I’m Video Codec SDK engineer at NVIDIA.

Before addressing actual issues I highly recommend you to update to latest Video Codec SDK release since 8.2.16 is outdated. Latest release is of major version 11.

After a minimal modification to the AppDec in order to display the time interval between the decoding of each pair of consecutive frames

I recommend you to place NVTX markers in NvDecoder::HandlePictureDecode and NvDecoder::HandlePictureDisplay methods and launch the application under Nsight Profiler. You will see the actual app timeline with NVTX markers which will show exact amount of time spent on every frame decode.

I noticed a periodical unexpected pattern that repeats each 10 frames

Could you please elaborate on that? If you could place NVTX markers in AppDec and upload the application timeline that would be very helpful.

Another problem rises when I try to use the rtsp stream as an input

RTSP input is somewhat supported by Video Codec SDK Samples simply because ffmpeg library is used to obtain Annex.B elementary video from variety of inputs and RTSP is usually supported out of the box by ffmpeg to some extent.

RTSP connectivity issues are far outside of the Video Codec SDK Samples scope so please use appropriate demuxer to work with RTSP cameras. Samples aim on API usage illustration only.

I already tried using the last version of the SDK (NVIDIA Video Codec SDK 11) with CUDA 11.2, but the results do not change and I have the same problems.

I tried to use the NVTX marker as suggested.
In the attachments you can find the reports generated using Nsight and some screenshoots that highlight the marker within the timeline.

I also uploaded an image of a line chart that shows a comparison among Bosch (reading from file) - Bosch (reading from rtsp) - Axis (reading from file), of the elapsed time between two consecutive marker points placed within the HandlePictureDisplay function.
I have taken 100 consecutive marker samples from those generated by NSight.
The X-axis represents the frame number.
The Y-axes represents the value of elapsed time between two consecutive marker points in milliseconds.
As you can see the unexpected results from the Bosch Camera are well visible, because it shows a high variability between consecutive marker points. Instead I would expect to have an almost straight line (this would mean that the HandlePictureDisplay would be invoked with regularity.

Report nsight axis file.qdrep (415.9 KB)
Report nsight bosch file.qdrep (697.1 KB)
Report nsight bosch rtsp.qdrep (467.8 KB)

comparison_bosch_axis (elapsed time between consecutive markers)

Screenshot nsight axis reading from file

Screenshot nsight bosch reding from file

Screenshot nsight bosch reading from RTSP

Hello Roman. We was trying to decode h.264 with opencv (contrib ) with nvcuvid with Axis camera. The same problem with GPU decode

Hi all,

I’ve been dealing with a similar problem for a while now and think I’ve made some progress. I’ll explain what I’ve learnt here as well as it might be useful to others moving forward.

RTSP is basically the TCP connection version of getting SDP information, which contains the information about the streams and how to access them. Typically RTP over UDP or TCP is used to transmit the streams on another channel.

For streaming via RTSP, it is not uncommon for a server to include the SPS and PPS information in the SDP and then omit those frames in the RTP streams (some streams do include them periodically though). SPS stand for “Sequence Parameter Set” and PPS “Picture Parameter Set” which are NAL units in the H264 standard. Without this information the NvDecoder essentially just sits idle discarding data until it sees them.

The FFmpegDemuxer provided in the video-codec-sdk does not appear to be detecting the format of the streams from the Axis camera I am using either. I did some digging around the FFMPEG code and found that the AVCodecParameters::extradata field is populated by the rtp h264 handling code. I did notice that the FFmpegDemuxer class is using this data as output for the first frame when bMp4MPEG4 is true but not for h264 streams. I also found that for my stream the class wasn’t detecting a H264 stream and setting bMp4H264 either (none of the flags were even set because the format long_name is RTSP input).

I modified the long name check to include RTSP input so that bMp4H264 becomes true and modified the Demux() function to skip the initial filtering on the first call and instead jump into the extra data processing section. After that I was getting frames decoded!

I still need to clean up (I’ve only updated and tested for H264 streams), but here is my hacky solution:

        bMp4H264 = eVideoCodec == AV_CODEC_ID_H264 && (
                !strcmp(fmtc->iformat->long_name, "QuickTime / MOV") 
                || !strcmp(fmtc->iformat->long_name, "FLV (Flash Video)") 
                || !strcmp(fmtc->iformat->long_name, "Matroska / WebM")
		|| !strcmp(fmtc->iformat->long_name, "RTSP input") // Added this

in Demux() after the error checking after the while loop for av_read_frame - marked with // Modified line

        if ((bMp4H264 || bMp4HEVC) && (frameCount != 0)) { // Modified line
            if ( {
            ck(av_bsf_send_packet(bsfc, &pkt));
            ck(av_bsf_receive_packet(bsfc, &pktFiltered));
            *ppVideo =;
            *pnVideoBytes = pktFiltered.size;
            if (pts)
                *pts = (int64_t) (pktFiltered.pts * userTimeScale * timeBase);
        } else {

            if ((bMp4MPEG4 || bMp4H264) && (frameCount == 0)) { // Modified line

                int extraDataSize = fmtc->streams[iVideoStream]->codecpar->extradata_size;

                if (extraDataSize > 0) {

                    // extradata contains start codes 00 00 01. Subtract its size
                    pDataWithHeader = (uint8_t *)av_malloc(extraDataSize + pkt.size - 3*sizeof(uint8_t));

                    if (!pDataWithHeader) {
                        LOG(ERROR) << "FFmpeg error: " << __FILE__ << " " << __LINE__;
                        return false;

                    memcpy(pDataWithHeader, fmtc->streams[iVideoStream]->codecpar->extradata, extraDataSize);
                    memcpy(pDataWithHeader+extraDataSize,, pkt.size - 3*sizeof(uint8_t));

                    *ppVideo = pDataWithHeader;
                    *pnVideoBytes = extraDataSize + pkt.size - 3*sizeof(uint8_t);

            } else {
                *ppVideo =;
                *pnVideoBytes = pkt.size;

            if (pts)
                *pts = (int64_t)(pkt.pts * userTimeScale * timeBase);

I hope that helps someone!