Out of memory error thrown by the driver instead of OpenGL

mOfl · February 5, 2016, 1:11pm

We have an application that couples a CUDA simulation with geospatial rendering of the results in OpenGL. Due to the size of the data, we often reach the video memory limit, which is 2 GB for the GTX 680 cards we currently use. The major problem we have with this is that we have almost no possibility to react to this situation.

The system will run out of memory when resizing or creating a texture or buffer object, which we do fairly frequently. When running out of video memory, one of four things will happen:

The OpenGL call reports GL_OUT_OF_MEMORY which we receive synchronously through the GL_KHR_debug callback. In this case, we can throw an exception and handle the situation in a satisfying way.
The driver pops a message: “Request for more GPU memory than is available”, the application will crash hard without any way to safely exit with pending changes etc.
Nothing happens at first, the allocation fails silently and the context dies. Next time any object of the context is used, arbitrary OpenGL error messages are reported, such as framebuffer incomplete, texture has size 0 etc.
The application freezes at the next OpenGL synchronization point such as glFinish or glMapBuffer.

The very, very bad thing is that option 1 will almost never occur. Unfortunately, the events here are sorted by increasing probability, with options 2 and 4 being a hard crash and an absolute no-go for an application.

Why is the handling of out of memory errors so inconsistent with the Nvidia driver? The driver must not overrule the OpenGL error reporting to instantly kill the entire application. And, well, considering events 3 and 4, the GL_OUT_OF_MEMORY error should be thrown at some point. What good is the error reporting callback if it dies with/before the context?

We use recent drivers (361.75) on Windows 8.1 Professional. The OpenGL context we request is 4.4 Core Profile, if that makes any difference.

droettger · February 9, 2016, 5:37pm

Some of these errors are not under the display drivers control but are coming from inside the operating system itself. For example if a single rendering command sent to the OS uses more resources than the OS can allocate for it inside the kernel(!) mode driver at this time, there is nothing to do about some of the consequences the OS takes.
That can be failing silently, returning a fatal error which forces the OpenGL driver to abort (your case 2), or maybe just shut down the driver.

That’s why you get different kinds of error messages. The OpenGL driver is able to catch out of memory cases in the user mode part, though on the kernel mode side if there is an out of memory condition the OS detects, OS counter measures against system failures get more drastic.

There’s little to do about the fact that your applied workload exceeds the capabilities of your chosen hardware, other than trying to put less burden on the graphics board memory by doing smaller things more often or use one or more workstation class boards with more VRAM which can handle the CUDA simulation and geospatial rendering you require.

mOfl · February 10, 2016, 10:58am

Thank you for your response! It makes sense that the driver or even OpenGL is not in full control of the OS and how it will handle failure cases. It does not make life easier for a developer, though. Could you maybe shed a bit light on why the out-of-memory error happens in the first place in an OpenGL application, given that not everything is needed at once? The driver is obviously able to swap data in the background, since we are constantly using more memory than the physical memory limit of the GPU, but never at the same time. This decreases performance, of course, but it works very conveniently. Except sometimes it does not, although there might be 1 GB of OpenGL textures marked as non-resident at the moment and you only want to resize a buffer object to 20 MB. From the outside it appears quite non-deterministic when stuff is just swapped and when the system is out of memory. Does this have to do with fragmentation of the VRAM?

The obvious workaround to all of this is to upgrade to Maxwell cards with significantly more VRAM, but academic research is not where the money grows …

droettger · February 10, 2016, 1:45pm

I’ve explained case 2. All other cases really depend on the other things which happen on the VRAM on a case by case basis. For example it’s unclear how much memory your CUDA simulation might block at the same time, etc. (You can let nvidia-smi.exe dump memory statistics while running your app to analyze that.)

Yes, there can be fragmentation. The GPU can also address PCI-E memory to some amount and overall workloads can be bigger than the installed VRAM, and yes there is also the possibility to swap to make things resident.

But in the end bigger boards which fit the application requirements are the only viable solution if you ever get an abort message with “Request for more GPU memory than is available” from the drivers.

mOfl · February 10, 2016, 2:55pm

Ok, thank you very much for your input! I will check out nvidia-smi, this sounds intriguing.

Topic		Replies	Views
Vista/OpenGL Interoperability return "Out of Memory" CUDA Programming and Performance	3	7756	June 29, 2010
Out of Memory after WMR/OpenXR session creation (WGL_NV_DX_interop2 extension) OpenGL	5	940	September 13, 2022
CUDA/OpenGL interop with EGLStream causes OpenGL out-of-memory error Linux	0	726	September 4, 2018
CUDA 3.1 (3.0) with OpenGL - out of memory problem CUDA Programming and Performance	2	1447	June 29, 2010
Persistent GL_OUT_OF_MEMORY after small uploads across many processes (Linux, NVIDIA) Linux opengl	0	58	September 21, 2025
Driver crashes application on RTX2080/GTX1080 as soon as dedicated VRAM is full OpenGL	0	701	April 19, 2019
out of memory CUDA Programming and Performance	11	16652	April 13, 2009
CUDA_OUT_OF_MEMORY despite large amounts of memory available CUDA Programming and Performance	5	2669	January 31, 2022
Is there a memory address limitation for OpenGL buffer allocation for GTX 1070 and 1080? CUDA Programming and Performance	3	1054	March 29, 2018
CUDA on retina macbook pro GeForce GT 650M CUDA Programming and Performance	4	3635	January 7, 2014

Out of memory error thrown by the driver instead of OpenGL

Related topics