How to handle CL_MEM_OBJECT_ALLOCATION_FAILURE errors if amount of useable memory is not known?

MichaelE1000 · July 27, 2017, 8:38am

Hi,

i developed a library that makes use of OpenCL. The code in this library can also be called from multiple threads if a separate instance of the algorithm implementation class is created for every single thread.

The problem starts once I allocate to much memory. Unfortunately this could be at any given time, not just when i allocate OpenCl Buffers. One example is the following error:
CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_READ_BUFFER on GeForce GTX 750 Ti (Device 0).

Meaning it’s possible that the library was able to allocate the OpenCL Buffers in the intialization part of the algorithm but once I execute another operation (e.g. enqueuing a kernel) it fails. (and returns 18446744073709551612 as error code).

The major problem is that I can control the amount of memory I allocate in my process but not outside of it and there is afaik no way to check how much unused memory is available. In theory Adobe Lightroom could be running at the same time and using up some memory for its algorithm and I would have less memory available than expected. Or what happens if somebody starts an OpenCL or OpenGL based application while my algorithm is already running and allocates some memory → it would mean that I end up again in suddenly not having enough memory to execute the next kernel within the algorithm or execute any other OpenCL function that requires some additional memory.

I tested the same thing also with AMD and Intel hardware, and there this was not an issue. There execution is simply delayed until memory is available.

How can this problem be solved with NVIDIA hardware?

Best Regards
Michael

MichaelE1000 · July 28, 2017, 8:22am

Currently I only found a partial solution:

I link in cudart and use it to retrieve the current amount of available memory. Depending on the result I either start my algorithm or return an appropriate error code on my API. (The only other way to get this information would be to create a hidden window, initialize an OpenGL context and retrieve it using glGetIntegerv(0x9049,…)).

The problem is that this does not solve it in all situations. In case my algorithm is running and somebody else starts an OpenGL or OpenCL application there might again not be enough memory. Meaning CL_MEM_OBJECT_ALLOCATION_FAILURE could occure at any given time when I call the next function on the OpenCL API.

Any suggestions on how to handle this? Or is this something that should be differently implemented in the driver? (Meaning delays, loading data of to the host memory,…).

Best Regards
Michael

Robert_Crovella · July 28, 2017, 2:12pm

Is this on windows, or linux, or both?

MichaelE1000 · July 28, 2017, 2:21pm

My project atm only supports Windows.

MichaelE1000 · July 30, 2017, 4:07pm

I wrote a small test tool to reproduce the same error. It reproduced it on Windows 10 and Fedora 26. On Windows I used the driver delivered with the Cuda 8 SDK and on Fedora I used the driver packaged up in the negativo17 repository. The system is an i7-6700k with 16gb of memory and the already mentioned GTX 750 TI.

The test code: https://pastebin.com/raw/TjjLE7zn
And CMakeLists.txt: https://pastebin.com/raw/rcxfmBuf

On Windows I can select the CPU (Intel OpenCL implementation on x86) and the NVIDIA GPU to execute it.
When I select the NVIDIA GPU this is the result:

C:\User\USER\Documents\opencl\build>Release\opencl_memalloc.exe
Device: 0
Platform: Intel(R) OpenCL
Name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

Device: 1
Platform: NVIDIA CUDA
Name: GeForce GTX 750 Ti

Select a devices: 1
Device Memory: 2147483648
Allocated 2147483648bytes.
ATtempting to execute kernel with buffer 0
ATtempting to execute kernel with buffer 1
CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 750 Ti (Device 0).

Exception caught: kernel exec, error code: -4

And when I select the CPU everything seems to be fine:

C:\Users\USER\Documents\opencl\build>Release\opencl_memalloc.exe
Device: 0
Platform: Intel(R) OpenCL
Name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

Device: 1
Platform: NVIDIA CUDA
Name: GeForce GTX 750 Ti

Select a devices: 0
Device Memory: 17117061120
Allocated 17179869184bytes.
ATtempting to execute kernel with buffer 0
ATtempting to execute kernel with buffer 1
...
ATtempting to execute kernel with buffer 14
ATtempting to execute kernel with buffer 15
EXEC finished

Meaning in the second case it’s not an issue to allocate more memory than there is on the device. And this was also true when I tried the same scenario with the Radeon R7 360 and the Intel HD 530 integrated GPU.

Best Regards
Michael

MichaelE1000 · August 16, 2017, 9:04am

I received a GTX 1050 Ti today.
The behaviour is the same.

D:\temp\opencl\build\Release
λ .\opencl_memalloc.exe
Device: 0
Platform: Intel(R) OpenCL
Name: Intel(R) HD Graphics 530

Device: 1
Platform: Intel(R) OpenCL
Name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

Device: 2
Platform: AMD Accelerated Parallel Processing
Name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

Device: 3
Platform: NVIDIA CUDA
Name: GeForce GTX 1050 Ti

Select a devices: 3
Device Memory: 4294967296
Allocated 4294967296bytes.
ATtempting to execute kernel with buffer 0
ATtempting to execute kernel with buffer 1
ATtempting to execute kernel with buffer 2
ATtempting to execute kernel with buffer 3
CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 1050 Ti (Device 0).

Exception caught: kernel exec, error code: -4

(Only the error code returned by the function is more reasonable (-4 instead of 18446744073709551612), but the problem still exists.

Regards
Michael

MichaelE1000 · October 2, 2017, 7:37am

I created a bug in the nvidia bug report utility. The conclusion was that this behaviour is intentional and does not violate the specification.

How I solved this atm:

When I start the algorithm initialization, I check the available memory using NVAPI.
I directly use the buffers after creating them in a simple dummy kernel → to ensure that they are created on the device.
After every time I enqueue a kernel I directly call queue.finish().
The algorithm initialization function uses a mutex to ensure that no initialization phase runs in parallel (and might incorrectly read out the available memory).

Not perfect, but under the circumstances the only working solution.

Regards

AndreyOGL_D3D · October 9, 2017, 11:13am

Hi, MichaelE1000, i have the same bug on nVidia GT 610 during running your sample on Windows 10 X64:

Device: 0
Platform: NVIDIA CUDA
Name: GeForce GT 610

Device: 1
Platform: AMD Accelerated Parallel Processing
Name: Oland

Device: 2
Platform: AMD Accelerated Parallel Processing
Name: Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz

Select a devices: 0
Device Memory: 1073741824
Allocated 1073741824bytes.
ATtempting to execute kernel with buffer 0
CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GT 610 (Device 0).

Exception caught: kernel exec, error code: -4

MichaelE1000 · October 9, 2017, 1:50pm

It’s normal to have this issue, according to NVidia Developers this is intended behaviour.

Another thing I recently noticed:
Try to enqueue multiple kernels using the same buffer. I have the slight suspicion that the nvidia driver just sums up the required space by all operations in the queue and does not check whether the used buffers are the same. Because if I did not flush every single operation, I sometimes had multiple kernels enqueued and got this error way earlier (even so that there was enough memory since the kernels were using the same buffer).

Besides that have a look at my previous posting, that’s how I solved the issue (or rather worked around it).

Topic		Replies	Views
CL_OUT_OF_RESOURCES In what situation it can be at ReadBuffer call? CUDA Programming and Performance	5	2817	October 12, 2010
Detecting buffer allocation failure? CUDA Programming and Performance	0	1415	July 22, 2010
opencl 6GB memory problem get error message at 4.2GB of memory CUDA Programming and Performance	20	10924	October 27, 2015
OpenCL API of clCreateBuffer() does not work as expected in a abnormal case CUDA Programming and Performance	2	809	February 20, 2019
Best Practice for Memory Managment in OpenCL CUDA Programming and Performance	3	4846	May 14, 2011
CL_DEVICE_MAX_MEM_ALLOC_SIZE Incorrect? CUDA Programming and Performance	10	7053	June 6, 2011
CL_MEM_OBJECT_ALLOCATION_FAILURE in clEnqueueWriteBuffer I can't get a simple convolution to run CUDA Programming and Performance	2	6161	June 8, 2010
CUDA/OpenCL runs multiple GPUs sequentially CUDA Programming and Performance	16	19349	November 26, 2015
why is CL_DEVICE_MAX_MEM_ALLOC_SIZE never larger than 25% of CL_DEVICE_GLOBAL_MEM_SIZE only on NVIDIA? CUDA Programming and Performance	11	12436	October 27, 2017
using cudaMalloc and cudaFree within a loop unspecified launch failure! CUDA Programming and Performance	21	37701	April 23, 2009

How to handle CL_MEM_OBJECT_ALLOCATION_FAILURE errors if amount of useable memory is not known?

Related topics