Creating NVENC encoders on multi-GPU cards fails with out of memory

GeForce GTX 690 cards have two GPUs (0 and 1). The following sequence fails with NV_ENC_ERR_OUT_OF_MEMORY:

  1. Create and run an encoder on device 0
  2. Create and run an encoder on device 1
  3. Create and run an encoder on device 0 <-- This fails in nvEncInitializeEncoder with NV_ENC_ERR_OUT_OF_MEMORY

The video can be very small (64x64) so it is not really an out of memory. Also, note that I don’t even have to encode anything. I can just create and delete the encoder. Note that if I don’t switch between GPUs (always stay on device 0 or 1), the out of memory issue doesn’t occur.

I have created an encoder that has an AutoSelect feature, which uses the GPU that is least used. After a few conversions, the encoder starts to fail and it won’t work again until I terminate the current process.

I am running this in Windows 10, using the latest driver (355.60).

Here is a short version of a function showing a sequence that reproduces the error. The error checking is stripped out to make the function easy to read:

// compress video using a GPU device
CUresult CompressVideo(int deviceID)
   CUcontext   pDevice;
   CUdevice    device;

   CUresult cuResult = cuDeviceGet(&device, deviceID);

   cuResult = cuCtxCreate(&pDevice, 0, device);

   CUcontext cuContextCurr;
   cuResult = cuCtxPopCurrent(&cuContextCurr);

   CNvHWEncoder *pNvHWEncoder = new CNvHWEncoder;

   /* On third call, this fails with NV_ENC_ERR_OUT_OF_MEMORY = 10 */
   NVENCSTATUS nvStatus = pNvHWEncoder->Initialize(pDevice, NV_ENC_DEVICE_TYPE_CUDA);
   if(nvStatus != CUDA_SUCCESS)
      printf("\npNvHWEncoder->Initialize failed %d", nvStatus);
      return nvStatus;

   EncodeConfig encodeConfig = {0};
   InitConfig(&encodeConfig); /* nothing special here, just use the defaults set Width,Height=64, encodeConfig.codec = NV_ENC_H264 */

   nvStatus = pNvHWEncoder->CreateEncoder(&encodeConfig);

/* test code to flush the encoder - the error occurs with or without it

   EncodeOutputBuffer stEOSOutputBfr = {0};
   stEOSOutputBfr.bEOSFlag = TRUE;
   nvStatus = pNvHWEncoder->NvEncRegisterAsyncEvent(&stEOSOutputBfr.hOutputEvent);

   nvStatus = pNvHWEncoder->NvEncFlushEncoderQueue(stEOSOutputBfr.hOutputEvent);

   WaitForSingleObject(stEOSOutputBfr.hOutputEvent, INFINITE);



   delete pNvHWEncoder;
   pNvHWEncoder = NULL;

   return cuCtxDestroy(pDevice);

I attached a ZIP file containing a Visual Studio 2008 project showing the error to this post. Just build and run the program and you should see the error. Make sure you have a GeForce GTX 690 card or a similar one with two GPUs. The project simply does something like this:

CompressVideo(0); // this fails with NV_ENC_ERR_OUT_OF_MEMORY

Note that the NVENC 5.0 documentation specifies that “The client should call NvEncDestroyEncodeSession to close the encoding session”. But there is no NvEncDestroyEncodeSession; instead, the function to close the session is called nvEncDestroyEncoder.

Forgot to mention that I run the card with “maximize 3d performance”, which uses both MPUs. I also have PhysX set to auto select. (124 KB)

The current SDK package allows up to two
simultaneous encode sessions per system for low-end
Quadro and GeForce cards. If the system contains any
low-end hardware (even in conjunction with other
high-end hardware), only two encoding sessions will
be permitted.

–NVENC_DA-06209-001_v06.pdf in docs of NVENC
hello, silviu22, you may get the answer from here

I think the problem is somewhere else. Because each conversion closes the previous encoding session.
It looks like you can re-use CUDA engine 1 multiple times, but you cannot use CUDA engine 0 multiple times (the second time you try to create an encoder on that engine, it fails with out of memory).

Because this works:
CompressVideo(0); /* works /
CompressVideo(1); /
works /
CompressVideo(1); /
works /
CompressVideo(1); /
works /
CompressVideo(1); /
works */

But this fails:
CompressVideo(0); /* works /
CompressVideo(1); /
works /
CompressVideo(0); /
this fails */

I confirm this issue. I receive the same error message just after starting the second session : OpenEncodeSessionEx failed: out of memory (10). But only when I’m running FFmpeg on a Windows 10 operating system.

Here is log file content :

[nvenc @ 000001a143ea1480] 1 CUDA capable devices found
[nvenc @ 000001a143ea1480] [ GPU #0 - < GeForce GTX 970 > has Compute SM 5.2, NVENC Available ]
[nvenc @ 000001a143ea1480] Nvenc initialized successfully
[nvenc @ 000001a144bcdc00] 1 CUDA capable devices found
[nvenc @ 000001a144bcdc00] [ GPU #0 - < GeForce GTX 970 > has Compute SM 5.2, NVENC Available ]
[nvenc @ 000001a144bcdc00] Nvenc initialized successfully
[nvenc @ 000001a144bcdc00] OpenEncodeSessionEx failed: out of memory (10)
[nvenc @ 000001a144bcdc00] Nvenc unloaded

When running the exact same program with the same parameters on a Windows 7 it works perfectly fine.

My NVIDIA driver is the 378.92 version (16 march 2017) on each operating system.

The message “OpenEncodeSessionEx failed: out of memory (10)” comes from NVENC SDK or Nvidia driver. I checked FFmpeg source code to be sure.

This problem was identified by Silviu22 on 08/21/2015. We are on 04/08/2017. We expect this memory leak will be closed as soon as possible now. In the next Nvidia driver.

I performed more tests. And it seems NVENC doesn’t kill instance properly when created through a DLL. While Windows 10 properly unload the library. This means the trigger doesn’t close the NVENC instance automaticaly when the DLL caller is killed by the main program.

To bypass this bug : the main program have to call another EXE process to access NVENC. Otherwise NVENC can’t kill the instance.

If NVidia can develop a method NvEncodeAPICloseInstance to manually kill an NvEncodeAPICreateInstance through a DLL. Should be better. Or develop a direct API for C#. Or properly debug NVENC instance for Windows 10.

NVENC was running better under Windows 7.

Bumping a pretty old thread here, but I hit the same bug and actually fixed it. dynlink_{cuda,nvcuvid}.cpp are the culprits.

Specifically the functions cuInit and cuvidInit and how the samples use them. They both call a form of LoadLibrary but never call FreeLibrary in any destructors. So it ends up imploding at some point due to this.

I simply modified cuInit and cuvidInit to take a HMODULE* so I could call FreeLibrary in my upper layer’s destructor on the respective pointers.