Exception in cuvidParseVideoData when decoding 8 bit AV1

I have made a slight modification of the AppDec.cpp example, so instead of using ffmpeg demuxer to get frame-by-frame data from the input stream, it reads in an entire file and passes that into the decoder.

The file in question is a single GOP, 60 frames long. I have a set of these files in various formats. This setup simulates the systems I work with, which does not make use of ffmpeg/demuxing at all. It simply sends a GOP into a decoder (Intel QuickSync).

I modify and hard code (for testing) the NvDecoder constructor parameter for cudaVideoCodec_H264, etc. depending on the type of GOP I’m testing.

Let’s see what happens with H264 input:

[INFO ][15:49:27] Video Input Information
        Codec        : AVC/H.264
        Frame rate   : 60000/1000 = 60 fps
        Sequence     : Progressive
        Coded size   : [1280, 720]
        Display area : [0, 0, 1280, 720]
        Chroma       : YUV 420
    	Bit depth    : 8

Video Decoding Params:

    	Num Surfaces : 6
        Crop         : [0, 0, 0, 0]
    	Resize       : 1280x720
        Deinterlace  : Weave

Total frame decoded: 60
Saved in file test.dat in NV12 format
Session Deinitialization Time: 23 ms

, what about HEVC input:

[INFO ][15:51:11] Video Input Information
    	Codec        : H.265/HEVC
        Frame rate   : 60/1 = 60 fps
    	Sequence     : Progressive
        Coded size   : [1280, 736]
    	Display area : [0, 0, 1280, 720]
        Chroma       : YUV 420
    	Bit depth    : 8

Video Decoding Params:

    	Num Surfaces : 8
        Crop         : [0, 0, 0, 0]
        Resize       : 1280x736
        Deinterlace  : Weave

Total frame decoded: 60
Saved in file test.dat in NV12 format
Session Deinitialization Time: 21 ms

, and HEVC 10 bit input:

[INFO ][15:52:32] Video Input Information
    
    	Codec        : H.265/HEVC
        Frame rate   : 60/1 = 60 fps
        Sequence     : Progressive
    	Coded size   : [1280, 736]
        Display area : [0, 0, 1280, 720]
        Chroma       : YUV 420
    	Bit depth    : 10

Video Decoding Params:

        Num Surfaces : 8
    	Crop         : [0, 0, 0, 0]
        Resize       : 1280x736
    	Deinterlace  : Weave

Total frame decoded: 60
Saved in file test.dat in P016 format
Session Deinitialization Time: 32 ms

, now, AV1 8 bit input:

[NvDecoder.cpp, line 811, NVDEC_API_CALL(cuvidParseVideoData(m_hParser, &packet));]
Exception thrown at 0x00007FFD52D8B50B (nvcuvid.dll) in nvidia sample.exe: 0xC0000005: Access violation reading location 0x0000000000000001.

I don’t expect a software engineer earning a quarter million a year at NVIDIA to test AppDec as I have done to find out what’s going on. Someone lower level perhaps. I have provided a set of GOPs, on my google drive just in case someone at NVIDIA, the tea lady maybe, takes an interest in finding out what the problem is. Note that the GOPs were encoded using my implementation of NVENC. It’s possible I’ve misconfigured the encoder somehow.

Note also that ffmpeg is able to decode the AV1 gops and spit out frames from them.

Some additional info: with H264 decoding HandlePictureDecode and HandlePictureDisplay are called for each of the 60 frames. With AV1 although HandlePictureDecode is called for each of them, HandlePictureDisplay is not. The exception occurs at the end of decoding and display is never called (HandlePictureDecode gets called 60 times).

Bump.

It just occurred to me that one way to see something different is to debug each case (ffmpeg demux and passing in an entire GOP), looking at the start bytes/codes to see if they’re the same. If they’re different, ffmpeg knows something the documentation isn’t telling us.

The start of the GOP is the same as the start of the first frame from the ffmpeg demuxer. So that theory can be thrown away. I think there’s a genuine bug in the parser when cuvidParseVideoData is called on a multiple frame GOP like this.

Gop: 12 00 0a 14 04 00 00 00 04 00 00 00 f3 40 00 0c de a6 7f d9 e0
Demuxed: 12 00 0a 14 04 00 00 00 04 00 00 00 f3 40 00 0c de a6 7f d9 e0

I solved this problem by cribbing some code from Intel’s SVT-AV1 reference project. I extracted the functions, types, etc. needed to get temporal unit headers and per-frame payload data , which I’m passing to cuvidParseVideoData one frame at a time (I can’t use ffmpeg for this as I cannot ship it due to licencing). It seems to work very nicely.

It would be good to avoid this additional code/complexity if cuvidParseVideoData worked in all cases with multiple frames but beggars cannot be choosers!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.