cuvidCreateDecoder fails with CUDA_ERROR_OUT_OF_MEMORY?

I am using Quadro M2000 card for hardware decoding.However,I fail to create the ninth decode session with params below.
memset(&oVideoDecodeCreateInfo_, 0, sizeof(CUVIDDECODECREATEINFO));

// Create video decoder
oVideoDecodeCreateInfo_.CodecType           = cudaVideoCodec_H264;
oVideoDecodeCreateInfo_.ulWidth             = 1280;
oVideoDecodeCreateInfo_.ulHeight            = 720;
oVideoDecodeCreateInfo_.ChromaFormat        = cudaVideoChromaFormat_420;
oVideoDecodeCreateInfo_.ulNumDecodeSurfaces = 16;

oVideoDecodeCreateInfo_.OutputFormat        = cudaVideoSurfaceFormat_NV12;
oVideoDecodeCreateInfo_.DeinterlaceMode     = cudaVideoDeinterlaceMode_Adaptive;

// No scaling
oVideoDecodeCreateInfo_.ulTargetWidth       = 1280;
oVideoDecodeCreateInfo_.ulTargetHeight      = 720;
oVideoDecodeCreateInfo_.ulNumOutputSurfaces  = 1;

CUresult ret;
CUvideodecoder decoder = NULL;
ret = cuvidCreateDecoder(&decoder, &oVideoDecodeCreateInfo_);

The ret is CUDA_ERROR_OUT_OF_MEMORY. But It takes no effect to reduce oVideoDecodeCreateInfo_.ulNumDecodeSurfaces from 16 to 4. So I wonder whether 8 decode sessions is the limit of Quadro M2000 card. I look to and for help, there is no answer to my puzzle, however.
Is there any one can help?
Thanks in advance.

Fix it. Too many cuda contexts lead to this problem. Just create one cuda context and push it current to each decoder thread.

do you push it to decoder thread via cuCtxPushCurrent or via cuCtxSetCurrent.
Do you use:
CUVIDDECODECREATEINFO.vidLock, cuvidCtxLockCreate,cuvidCtxLock/cuvidCtxUnlock to synchronize context between decoding threads or it’s enough to call cuCtxSetCurrent in each ?

My current implementation just call cuCtxSetCurrent in each decoding thread and don’t use “CUVIDDECODECREATEINFO.vidLock, cuvidCtxLockCreate,cuvidCtxLock/cuvidCtxUnlock”. Is it ok ? There is no any recommendations in NVDEC docs. I found cuCtxSetCurrent accidentally and now use it and it works but I cannot find any confirmation that it’s correct approach.

maybe some contexts are not destroyed.Check you code, if so, you can call cuCtxDestroy(cuContext).