Minimal NVDECODE experiment fails to map, errors with: mapping of buffer object failed

I’m looking into the NVDECODE API, comparing my code with the cudaDecodeGL
example. I’ve created a h264 parser which reads annex-b and gives me the separate nals which I feed into cuvidParseVideoData(). While comparing the data between my code and the cudaDecodeGL example code I noticed that you actually don’t have to extract the nals before feeding them into cuvidParseVideoData() (as opposed to some other threads describe).

My sequence/decode/display callbacks which are configured with the CUvideoparser are all being called. In the decode callback I call cuvidDecodePicture() and in my display callback I try to map the current picture. The first two calls to cuvidMapVideoFrame() return CUDA_SUCCESS but after that all other calls fail. Also every call to cuvidUnmapVideoFrame() fails.

I might be doing something trivially wrong (dimensions, not setting extra data, not making the context current (?)) but because only the mapping fails and calling cuvidDecodePicture() continues to return CUDA_SUCCESS it might be a bigger issue.

I’ve pasted a experimental version of my code here: Diving into the NVDECODE API · GitHub

Any thoughts on what might cause the mapping to fail would be really appreciated,

Update 1
When I remove the queueing code from the cudaDecodeGL example I get the same behavior as in my code. Makes me wonder if it’s something related to the queueing. From the documentation I don’t see anything that points into the direction that you can’t map a frame in the display callback. I posted the changes I made to VideoParser.cpp here:

Update 2
I found the bug. I was passing my CUdevice into cuvidMapVideoFrame(), see