cuvidCreateDecoder return error CUDA_ERROR_OUT_OF_MEMORY

int TestDecoder() {
  CUVIDDECODECREATEINFO decparam = {};
  decparam.CodecType 			= cudaVideoCodec_HEVC;
  decparam.ulWidth				= 1920;
  decparam.ulHeight				= 1080;
  decparam.ulNumDecodeSurfaces	= 16;
  decparam.ulTargetWidth		= 1920;
  decparam.ulTargetHeight		= 1080;
  decparam.ulNumOutputSurfaces	= 1;
  decparam.ChromaFormat			= cudaVideoChromaFormat_420;    // cudaVideoChromaFormat_XXX (only 4:2:0 is currently supported)
  decparam.ulCreationFlags		= cudaVideoCreate_PreferCUVID;  // Decoder creation flags (cudaVideoCreateFlags_XXX)
  decparam.display_area.left 	= 0;
  decparam.display_area.top 	= 0;
  decparam.display_area.right 	= 1920;
  decparam.display_area.bottom	= 1080;
  decparam.target_rect.left 	= 0;
  decparam.target_rect.top 		= 0;
  decparam.target_rect.right 	= 1920;
  decparam.target_rect.bottom	= 1080;
  bool deinterlace = false;
  decparam.bitDepthMinus8       = 0;
  decparam.OutputFormat			= cudaVideoSurfaceFormat_NV12;
  decparam.DeinterlaceMode		= cudaVideoDeinterlaceMode_Weave;
  decparam.vidLock				= nullptr;

  CUresult cures;
  CUvideodecoder hDecoder;

  if(CUDA_SUCCESS != (cures = cuvidCreateDecoder(&hDecoder, &decparam)))
    return fprintf(stderr, "cuvidCreateDecoder error %d\n", cures), cures;

  if(hDecoder && CUDA_SUCCESS != (cures = cuvidDestroyDecoder(hDecoder)))
    return fprintf(stderr, "cuvidDestroyDecoder error %d\n", cures), cures;

  return 0;
}

int main()
{
	if(auto cures = cuInit(0))
      return fprintf(stderr, "cuInit error %d", cures), cures;

	CUcontext ctx;
    if(auto cures = cuCtxCreate(&ctx, CU_CTX_SCHED_BLOCKING_SYNC, 0))
      return fprintf(stderr, "cuCtxCreate error %d", cures), cures;

	for(int i = 0;i>=0; i++)
	{
	  if(auto res = TestDecoder())
	    return res;

	  fprintf(stderr, "\r%c", "\\-/|"[i&3]);
	}

	return 0;
}

(1) call cuvidCreateDecoder
(2) call cuvidDestroyDecoder
If you do a cycle of 1 and 2 points, we can see how the size of “Virtual size” (see Process Explorer) increases. And after some time cuvidCreateDecoder returns the error CUDA_ERROR_OUT_OF_MEMORY.

OS: Windows 10 x64 (22H2 Build 19045.2486)
GPU: NVIDIA RTX 3070 (driver version: 528.24)

This is bug in NVIDIA driver?

Hi there @Vitaly_Shemet and welcome to the NVIDIA developer forums!

This is not really a realistic and typical use-case, so I wouldn’t exclude the possibility that it is a simple case of de-allocation being done asynchronously and thus causing unintended memory leaks.

Also statically allocating the descriptor structs inside the functions can cause trouble in how cuVid handles the decoder reference.

Of course it could be an oversight in CUDA memory handling, but i would rather check first if this happens in a normal use-case.

This is an example as a demonstration of work. In a real application, video files(H264/H265/others) are opened, decoded and closed, and after some time an CUDA_ERROR_OUT_OF_MEMORY error occurs. I tried to keep the code as simple as possible to show the problem.