Weird performance hit with cuvidMapVideoFrame()


I’ve developed a decoding solution for H264 network streaming and rendering with D3D11, and everything looked to work fine at first, until I noticed a frustrating difference in framerate between my component and D3D11Decode sample. While the sample gives me nice ~40 fps on decoding 4k video, my app struggles at 3-4 fps. So, I drilled down to time each and every call in the pipeline and it appears, that the call to cuvidMapVideoFrame takes ~235-250 ms!!! It makes no sense at all, since my code is based upon the sample. I’m absolutely stuck here, since no similar issues were found on internet. Please help!

Need to see your code.