OpenCL API of clCreateBuffer() does not work as expected in a abnormal case

yys_3399 · February 20, 2019, 2:23am

if the total size of buffer alloced by OpenCL API clCreateBuffer() in nulti times is much more than the value from the deviceInfo of CL_DEVICE_MAX_MEM_ALLOC_SIZE, I can also get a valid buffer without any errorCode_ret from this API, which is not expected.

can anyone help clarify this behavior please, thanks in advance.

for detail:
there is only one device in my setup, that 1080Ti on OpenCL 1.2.
the CL_DEVICE_MAX_MEM_ALLOC_SIZE of deviceInfo returns 0xae918000,
and then creating the buffer by clCreateBuffer() with size of 0x20000000 every time,
and this API can be invoked more than 100000+ times normally with a valid buffer, which can be verified by getMemObjectInfo().

Robert_Crovella · February 20, 2019, 4:08am

buffer allocations in OpenCL can be deferred:

https://devtalk.nvidia.com/default/topic/493543/best-practice-for-memory-managment-in-opencl/

I think what you are seeing is expected behavior. If you want to witness an out of memory condition, you have to actually use those buffers.

Here’s a simple test case demonstrating this:

$ cat t6.cpp
#include <CL/opencl.h>
#include <stdio.h>
const int nblk = 256;
int main(int argc, char *argv[])
{
  cl_platform_id platform;
  cl_device_id device;
  cl_context context;
  cl_mem mem1[nblk];
  cl_int err;
  cl_command_queue queue1;
  cl_event event1[nblk];

  err = clGetPlatformIDs(1, &platform, NULL);
  if (err != CL_SUCCESS) {printf("%d: %d\n", __LINE__, err); return -1;}
  err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, &device, NULL);
  if (err != CL_SUCCESS) {printf("%d: %d\n", __LINE__, err); return -1;}
  context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
  if (err != CL_SUCCESS) {printf("%d: %d\n", __LINE__, err); return -1;}
  queue1 = clCreateCommandQueue(context, device, CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &err);
  if (err != CL_SUCCESS) {printf("%d: %d\n", __LINE__, err); return -1;}
  size_t mem_size = 0x20000000;  // 512MB
  unsigned char pattern = 0;
  int i = 0;
  while ((i < nblk)&&(err == CL_SUCCESS)){
    mem1[i] = clCreateBuffer(context, CL_MEM_READ_WRITE, mem_size, NULL, &err);
#ifdef USE_FILL
    if (i > 0)
      err += clEnqueueFillBuffer(queue1, mem1[i], &pattern, 1, 0, mem_size, 1, event1+i-1, event1+i);
    else
      err += clEnqueueFillBuffer(queue1, mem1[i], &pattern, 1, 0, mem_size, 0, NULL, event1+i);
#endif
    i++;}
  if (err != CL_SUCCESS)
    printf("ocl error: %d at iteration: %d\n", err, i);
  else
    printf("%d loops finished with no error\n", i);
}
$ g++ t6.cpp -I/usr/local/cuda/include -lOpenCL -o t6
$ ./t6
256 loops finished with no error
$ g++ t6.cpp -I/usr/local/cuda/include -lOpenCL -o t6 -DUSE_FILL
$ ./t6
ocl error: -4 at iteration: 63
$

this is being run on a tesla V100 32GB GPU, linux, driver 410.48

We see that if we don’t use the buffers, 256 loops complete successfully, where each buffer is 512MB in size.

If we actually use the buffers as we allocate, the loop fails on the 63 iteration (-4 = CL_MEM_OBJECT _ALLOCATION_FAILURE). This makes sense because 62 successful iterations at 512MB/iteration is 31GB, which is reasonable for the 32GB GPU.

yys_3399 · February 20, 2019, 5:32am

get it, thanks a lot. @Robert_Crovella.

Topic		Replies	Views
Best Practice for Memory Managment in OpenCL CUDA Programming and Performance	3	5043	May 14, 2011
Never getting "out of memory" on Win7 x64 / Geforce 9800 GT CL_MEM_OBJECT_ALLOCATION_FAILURE CUDA Programming and Performance	2	2137	March 5, 2011
CL_DEVICE_MAX_MEM_ALLOC_SIZE Incorrect? CUDA Programming and Performance	10	7207	June 6, 2011
Detecting buffer allocation failure? CUDA Programming and Performance	0	1446	July 22, 2010
How to handle CL_MEM_OBJECT_ALLOCATION_FAILURE errors if amount of useable memory is not known? CUDA Programming and Performance	8	15705	October 9, 2017
CL_OUT_OF_RESOURCES In what situation it can be at ReadBuffer call? CUDA Programming and Performance	5	2908	October 12, 2010
opencl 6GB memory problem get error message at 4.2GB of memory CUDA Programming and Performance	20	11291	October 27, 2015
CL_MEM_OBJECT_ALLOCATION_FAILURE in clEnqueueWriteBuffer I can't get a simple convolution to run CUDA Programming and Performance	2	6219	June 8, 2010
why is CL_DEVICE_MAX_MEM_ALLOC_SIZE never larger than 25% of CL_DEVICE_GLOBAL_MEM_SIZE only on NVIDIA? CUDA Programming and Performance	11	12789	October 27, 2017
Allocate big global memory buffer CUDA Programming and Performance	4	5373	February 19, 2010

OpenCL API of clCreateBuffer() does not work as expected in a abnormal case

Related topics