Why cuvidDecodePicture() can not accept bitstream data large than 2000000 bytes?

I found when I send a bitstream data large than 2000000 bytes to cuvidDecodePicture(),the function will return CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES,why this function can not accept bitstream data large than 2000000 bytes? An I frame large than 2000000 bytes is very common in a h264 bitstream with 4K resolution,so how can I input such frame data to decoder?

This is outside my area of expertise, but creation of NVCUVID predated the introduction of 4K capabilities in GPUs by a couple of year. According to what appears to be the latest documentation it only incorporates support for full HD at 1080p (see page 5 of http://docs.nvidia.com/cuda/samples/3_Imaging/cudaDecodeGL/doc/nvcuvid.pdf).

But I also found if I use nvcuvid decoder through Micrcsoft’s DXVA interface,the sample large than 2000000 bytes can be docoded normally,why?

Same situation here, using the cudaDecodeGL sample with a 4k video (Big buck bunny @4K). the CUVIDPICPARAMS have a nBitstreamDataLen > 2000000 it fails with that same error.

More info: I am using CUDA 6.5. I had to modify the sample, in VideoDecoder.cpp, increasing the Limit decode memory to 48M pixels in the VideoDecoder constructor.

I use Geforce GTX 660, according to this list:https://en.wikipedia.org/wiki/Nvidia_PureVideo#Table_of_GPUs_containing_a_PureVideo_SIP_block

It should support 4K decoding. I’m on windows and I intend to use OpenGL rather than DX, is there a problem related to OpenGL and 4K?