Creating NVENC encoders on multi-GPU cards fails with out of memory

silviu22 · August 21, 2015, 4:48pm

GeForce GTX 690 cards have two GPUs (0 and 1). The following sequence fails with NV_ENC_ERR_OUT_OF_MEMORY:

Create and run an encoder on device 0
Create and run an encoder on device 1
Create and run an encoder on device 0 ← This fails in nvEncInitializeEncoder with NV_ENC_ERR_OUT_OF_MEMORY

The video can be very small (64x64) so it is not really an out of memory. Also, note that I don’t even have to encode anything. I can just create and delete the encoder. Note that if I don’t switch between GPUs (always stay on device 0 or 1), the out of memory issue doesn’t occur.

I have created an encoder that has an AutoSelect feature, which uses the GPU that is least used. After a few conversions, the encoder starts to fail and it won’t work again until I terminate the current process.

I am running this in Windows 10, using the latest driver (355.60).

Here is a short version of a function showing a sequence that reproduces the error. The error checking is stripped out to make the function easy to read:

// compress video using a GPU device
CUresult CompressVideo(int deviceID)
{
   CUcontext   pDevice;
   CUdevice    device;

   CUresult cuResult = cuDeviceGet(&device, deviceID);

   cuResult = cuCtxCreate(&pDevice, 0, device);

   CUcontext cuContextCurr;
   cuResult = cuCtxPopCurrent(&cuContextCurr);

   CNvHWEncoder *pNvHWEncoder = new CNvHWEncoder;

   /* On third call, this fails with NV_ENC_ERR_OUT_OF_MEMORY = 10 */
   NVENCSTATUS nvStatus = pNvHWEncoder->Initialize(pDevice, NV_ENC_DEVICE_TYPE_CUDA);
   if(nvStatus != CUDA_SUCCESS)
   {
      printf("\npNvHWEncoder->Initialize failed %d", nvStatus);
      return nvStatus;
   }

   EncodeConfig encodeConfig = {0};
   InitConfig(&encodeConfig); /* nothing special here, just use the defaults set Width,Height=64, encodeConfig.codec = NV_ENC_H264 */

   nvStatus = pNvHWEncoder->CreateEncoder(&encodeConfig);

/* test code to flush the encoder - the error occurs with or without it

   EncodeOutputBuffer stEOSOutputBfr = {0};
   stEOSOutputBfr.bEOSFlag = TRUE;
   nvStatus = pNvHWEncoder->NvEncRegisterAsyncEvent(&stEOSOutputBfr.hOutputEvent);

   nvStatus = pNvHWEncoder->NvEncFlushEncoderQueue(stEOSOutputBfr.hOutputEvent);

   WaitForSingleObject(stEOSOutputBfr.hOutputEvent, INFINITE);

   pNvHWEncoder->NvEncUnregisterAsyncEvent(stEOSOutputBfr.hOutputEvent);
   nvCloseFile(stEOSOutputBfr.hOutputEvent);
*/

   pNvHWEncoder->NvEncDestroyEncoder();

   delete pNvHWEncoder;
   pNvHWEncoder = NULL;

   return cuCtxDestroy(pDevice);
}

I attached a ZIP file containing a Visual Studio 2008 project showing the error to this post. Just build and run the program and you should see the error. Make sure you have a GeForce GTX 690 card or a similar one with two GPUs. The project simply does something like this:

cuInit();
CompressVideo(0);
CompressVideo(1);
CompressVideo(0); // this fails with NV_ENC_ERR_OUT_OF_MEMORY

Note that the NVENC 5.0 documentation specifies that “The client should call NvEncDestroyEncodeSession to close the encoding session”. But there is no NvEncDestroyEncodeSession; instead, the function to close the session is called nvEncDestroyEncoder.

Forgot to mention that I run the card with “maximize 3d performance”, which uses both MPUs. I also have PhysX set to auto select.
OutOfMem.zip (124 KB)

wonsea · September 6, 2015, 10:11am

The current SDK package allows up to two
simultaneous encode sessions per system for low-end
Quadro and GeForce cards. If the system contains any
low-end hardware (even in conjunction with other
high-end hardware), only two encoding sessions will
be permitted.

–NVENC_DA-06209-001_v06.pdf in docs of NVENC
hello, silviu22, you may get the answer from here

silviu22 · September 21, 2015, 5:45pm

I think the problem is somewhere else. Because each conversion closes the previous encoding session.
It looks like you can re-use CUDA engine 1 multiple times, but you cannot use CUDA engine 0 multiple times (the second time you try to create an encoder on that engine, it fails with out of memory).

Because this works:
CompressVideo(0); /* works /
CompressVideo(1); / works /
CompressVideo(1); / works /
CompressVideo(1); / works /
CompressVideo(1); / works */

But this fails:
CompressVideo(0); /* works /
CompressVideo(1); / works /
CompressVideo(0); / this fails */

Beloko · April 7, 2017, 10:19pm

I confirm this issue. I receive the same error message just after starting the second session : OpenEncodeSessionEx failed: out of memory (10). But only when I’m running FFmpeg on a Windows 10 operating system.

Here is log file content :

[nvenc @ 000001a143ea1480] 1 CUDA capable devices found
[nvenc @ 000001a143ea1480] [ GPU #0 - < GeForce GTX 970 > has Compute SM 5.2, NVENC Available ]
[nvenc @ 000001a143ea1480] Nvenc initialized successfully
[nvenc @ 000001a144bcdc00] 1 CUDA capable devices found
[nvenc @ 000001a144bcdc00] [ GPU #0 - < GeForce GTX 970 > has Compute SM 5.2, NVENC Available ]
[nvenc @ 000001a144bcdc00] Nvenc initialized successfully
[nvenc @ 000001a144bcdc00] OpenEncodeSessionEx failed: out of memory (10)
[nvenc @ 000001a144bcdc00] Nvenc unloaded

When running the exact same program with the same parameters on a Windows 7 it works perfectly fine.

My NVIDIA driver is the 378.92 version (16 march 2017) on each operating system.

The message “OpenEncodeSessionEx failed: out of memory (10)” comes from NVENC SDK or Nvidia driver. I checked FFmpeg source code to be sure.

This problem was identified by Silviu22 on 08/21/2015. We are on 04/08/2017. We expect this memory leak will be closed as soon as possible now. In the next Nvidia driver.

Beloko · April 8, 2017, 2:40am

I performed more tests. And it seems NVENC doesn’t kill instance properly when created through a DLL. While Windows 10 properly unload the library. This means the trigger doesn’t close the NVENC instance automaticaly when the DLL caller is killed by the main program.

To bypass this bug : the main program have to call another EXE process to access NVENC. Otherwise NVENC can’t kill the instance.

If NVidia can develop a method NvEncodeAPICloseInstance to manually kill an NvEncodeAPICreateInstance through a DLL. Should be better. Or develop a direct API for C#. Or properly debug NVENC instance for Windows 10.

NVENC was running better under Windows 7.

VelaK · November 1, 2017, 4:41pm

Bumping a pretty old thread here, but I hit the same bug and actually fixed it. dynlink_{cuda,nvcuvid}.cpp are the culprits.

Specifically the functions cuInit and cuvidInit and how the samples use them. They both call a form of LoadLibrary but never call FreeLibrary in any destructors. So it ends up imploding at some point due to this.

I simply modified cuInit and cuvidInit to take a HMODULE* so I could call FreeLibrary in my upper layer’s destructor on the respective pointers.

Topic		Replies	Views
Where can I get GTX 680 NVENC SDK ? CUDA Programming and Performance	7	8725	February 8, 2014
NV_ENC_ERR_UNSUPPORTED_DEVICE on nvEncOpenEncodeSessionEx General Topics and Other SDKs	6	3295	September 4, 2019
nvEncEncodePicture stuck for more input Video Processing & Optical Flow cuda , nvenc	2	961	July 24, 2023
Running NVDecoder from Docker Video Processing & Optical Flow decoder , cuda , ubuntu , video	7	1804	July 4, 2023
NVENC H.264 encoder initialization failed Linux	16	32146	March 28, 2017
NVEnc Issue with Delphi since API V12.1 Video Processing & Optical Flow nvenc	7	770	May 29, 2024
nvEncOpenEncodeSessionEx results in NV_ENC_ERR_GENERIC in drivers after 537.58 Video Processing & Optical Flow	5	512	February 1, 2024
Session count limitation for NVENC (No Maxwell GPUs with 2+ NEVENC sessions?) GPU-Accelerated Libraries	25	33216	February 26, 2018
Weird output with Nvidia Encoder SDK Video Processing & Optical Flow	1	1529	November 30, 2022
Getting scrambled pixels on NVENC output Video Processing & Optical Flow nvenc	2	802	July 10, 2023

Creating NVENC encoders on multi-GPU cards fails with out of memory

Related topics