OpenCL CL_OUT_OF_RESOURCES on event.wait()

Hi all,

I already posted in “DirectX / OpenGL & other APIs” but now I’ve seen that here it is better.

I’m trying to set up a Mandelbrot on OpenCL.

The CL C code in compiling correct and the execution begins without any issues.

But when I call event.wait() an CL_OUT_OF_RESOURCES occures. I can’t imagine why.

My host code (N = 512 an M = 512):

// Initialize OpenCL

    cl_int error;

    cl::vector< cl::Platform > platformList;

    cl::Platform::get(&platformList);

    checkErr(platformList.size()!=0 ? CL_SUCCESS : -1, "cl::Platform::get");

    std::cout << "Platform number is: " << platformList.size() << std::endl;

std::string platformVendor;

    platformList[0].getInfo((cl_platform_info)CL_PLATFORM_VENDOR, &platformVendor);

    std::cout << "Platform is by: " << platformVendor << "\n";

    cl_context_properties cprops[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties)(platformList[0])(), 0};

	

    cl::Context context(CL_DEVICE_TYPE_GPU/*CL_DEVICE_TYPE_CPU*/,	cprops,	NULL, NULL,	&error);

    checkErr(error, "Conext::Context()");

	/** Allocate Buffers **/

	/** integers ************************************************************************************************/

		cl::Buffer maxCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(int), &max, &error);

		checkErr(error, "Buffer::outCL(max)");

	/** floats **************************************************************************************************/

		cl::Buffer mCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &m, &error);

		checkErr(error, "Buffer::outCL(m)");

		cl::Buffer nCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &n, &error);

		checkErr(error, "Buffer::outCL(n)");

		cl::Buffer cxCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &cx, &error);

		checkErr(error, "Buffer::outCL(cx)");

		cl::Buffer cyCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &cy, &error);

		checkErr(error, "Buffer::outCL(cy)");

		cl::Buffer widthCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &width, &error);

		checkErr(error, "Buffer::outCL(width)");

		cl::Buffer heightCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &height, &error);

		checkErr(error, "Buffer::outCL(height)");

	/** image ***************************************************************************************************/

		cl::Buffer imageCL( context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, M*N*sizeof(GLubyte), image, &error);

		checkErr(error, "Buffer::outCL(image)");	/*****/

	

    // get OpenCL devices	

    cl::vector<cl::Device> devices;

    devices = context.getInfo<CL_CONTEXT_DEVICES>();

    checkErr(devices.size() > 0 ? CL_SUCCESS : -1, "devices.size() > 0");

	

    // read OpenCL C file

	

    #ifdef __APPLE__

	std::ifstream file("../../mandelbrot.cl");

    #else

	std::ifstream file("mandelbrot.cl");

    #endif	

    std::ios_base::iostate state = file.rdstate();

    checkErr(file.is_open() ? CL_SUCCESS:-1, "read mandelbrot.cl");

    // save file to string

    std::string prog(std::istreambuf_iterator<char>(file), (std::istreambuf_iterator<char>()));

	

    // make OpenCL C file to OpenCL source

    cl::Program::Sources source(1, std::make_pair(prog.c_str(), prog.length()+1));

	

    // build OpenCL source

    cl::Program program(context, source);

    error = program.build(devices,"");

    checkErr(error, "cl::Program program.build()");

	

    // define entry point

    cl::Kernel kernel(program, "mandelCalcCL", &error);

    checkErr(error, "Kernel::Kernel()");

	

    // set arguments

    error = kernel.setArg(0, maxCL);

    checkErr(error, "Kernel::setArg(max)");

    error = kernel.setArg(1, mCL);

    checkErr(error, "Kernel::setArg(m)");

    error = kernel.setArg(2, nCL);

    checkErr(error, "Kernel::setArg(n)");

    error = kernel.setArg(3, cxCL);

    checkErr(error, "Kernel::setArg(cx)");

    error = kernel.setArg(4, cyCL);

    checkErr(error, "Kernel::setArg(cy)");

    error = kernel.setArg(5, widthCL);

    checkErr(error, "Kernel::setArg(width)");

    error = kernel.setArg(6, heightCL);

    checkErr(error, "Kernel::setArg(height)");

    error = kernel.setArg(7, imageCL);

    checkErr(error, "Kernel::setArg(image)");

	

    // initialize queue

    cl::CommandQueue queue(context, devices[0], 0, &error);

    checkErr(error, "CommandQueue::CommandQueue()");

// create event

    cl::Event event;

    error = queue.enqueueNDRangeKernel(kernel, cl::NullRange,	cl::NDRange(N*M*sizeof(GLubyte)), cl::NDRange(1/*, 1*/), NULL, &event);

    checkErr(error, "ComamndQueue::enqueueNDRangeKernel()");

	

    // wait until computation is finished

    error = event.wait();

    checkErr(error, "event.wait()");

// read image buffer

    error = queue.enqueueReadBuffer(imageCL, CL_TRUE, 0, N*M*sizeof(GLubyte), &image);

    checkErr(error, "ComamndQueue::enqueueReadBuffer(image)");

May be you can help me to figure out this problem.

Greets

Henrik

PS:

I’m running Ubuntu 10.04 x64

on Q9450 (4 x 2,4 GHz) with 3962 MB RAM and GeForce 8600 GT (256 MB RAM, 32 CUDA cores)

CUDA is the latest version available

Hi all,

I already posted in “DirectX / OpenGL & other APIs” but now I’ve seen that here it is better.

I’m trying to set up a Mandelbrot on OpenCL.

The CL C code in compiling correct and the execution begins without any issues.

But when I call event.wait() an CL_OUT_OF_RESOURCES occures. I can’t imagine why.

My host code (N = 512 an M = 512):

// Initialize OpenCL

    cl_int error;

    cl::vector< cl::Platform > platformList;

    cl::Platform::get(&platformList);

    checkErr(platformList.size()!=0 ? CL_SUCCESS : -1, "cl::Platform::get");

    std::cout << "Platform number is: " << platformList.size() << std::endl;

std::string platformVendor;

    platformList[0].getInfo((cl_platform_info)CL_PLATFORM_VENDOR, &platformVendor);

    std::cout << "Platform is by: " << platformVendor << "\n";

    cl_context_properties cprops[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties)(platformList[0])(), 0};

	

    cl::Context context(CL_DEVICE_TYPE_GPU/*CL_DEVICE_TYPE_CPU*/,	cprops,	NULL, NULL,	&error);

    checkErr(error, "Conext::Context()");

	/** Allocate Buffers **/

	/** integers ************************************************************************************************/

		cl::Buffer maxCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(int), &max, &error);

		checkErr(error, "Buffer::outCL(max)");

	/** floats **************************************************************************************************/

		cl::Buffer mCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &m, &error);

		checkErr(error, "Buffer::outCL(m)");

		cl::Buffer nCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &n, &error);

		checkErr(error, "Buffer::outCL(n)");

		cl::Buffer cxCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &cx, &error);

		checkErr(error, "Buffer::outCL(cx)");

		cl::Buffer cyCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &cy, &error);

		checkErr(error, "Buffer::outCL(cy)");

		cl::Buffer widthCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &width, &error);

		checkErr(error, "Buffer::outCL(width)");

		cl::Buffer heightCL( context, CL_MEM_READ_ONLY/*CL_MEM_READ_WRITE*/ | CL_MEM_USE_HOST_PTR, sizeof(float), &height, &error);

		checkErr(error, "Buffer::outCL(height)");

	/** image ***************************************************************************************************/

		cl::Buffer imageCL( context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, M*N*sizeof(GLubyte), image, &error);

		checkErr(error, "Buffer::outCL(image)");	/*****/

	

    // get OpenCL devices	

    cl::vector<cl::Device> devices;

    devices = context.getInfo<CL_CONTEXT_DEVICES>();

    checkErr(devices.size() > 0 ? CL_SUCCESS : -1, "devices.size() > 0");

	

    // read OpenCL C file

	

    #ifdef __APPLE__

	std::ifstream file("../../mandelbrot.cl");

    #else

	std::ifstream file("mandelbrot.cl");

    #endif	

    std::ios_base::iostate state = file.rdstate();

    checkErr(file.is_open() ? CL_SUCCESS:-1, "read mandelbrot.cl");

    // save file to string

    std::string prog(std::istreambuf_iterator<char>(file), (std::istreambuf_iterator<char>()));

	

    // make OpenCL C file to OpenCL source

    cl::Program::Sources source(1, std::make_pair(prog.c_str(), prog.length()+1));

	

    // build OpenCL source

    cl::Program program(context, source);

    error = program.build(devices,"");

    checkErr(error, "cl::Program program.build()");

	

    // define entry point

    cl::Kernel kernel(program, "mandelCalcCL", &error);

    checkErr(error, "Kernel::Kernel()");

	

    // set arguments

    error = kernel.setArg(0, maxCL);

    checkErr(error, "Kernel::setArg(max)");

    error = kernel.setArg(1, mCL);

    checkErr(error, "Kernel::setArg(m)");

    error = kernel.setArg(2, nCL);

    checkErr(error, "Kernel::setArg(n)");

    error = kernel.setArg(3, cxCL);

    checkErr(error, "Kernel::setArg(cx)");

    error = kernel.setArg(4, cyCL);

    checkErr(error, "Kernel::setArg(cy)");

    error = kernel.setArg(5, widthCL);

    checkErr(error, "Kernel::setArg(width)");

    error = kernel.setArg(6, heightCL);

    checkErr(error, "Kernel::setArg(height)");

    error = kernel.setArg(7, imageCL);

    checkErr(error, "Kernel::setArg(image)");

	

    // initialize queue

    cl::CommandQueue queue(context, devices[0], 0, &error);

    checkErr(error, "CommandQueue::CommandQueue()");

// create event

    cl::Event event;

    error = queue.enqueueNDRangeKernel(kernel, cl::NullRange,	cl::NDRange(N*M*sizeof(GLubyte)), cl::NDRange(1/*, 1*/), NULL, &event);

    checkErr(error, "ComamndQueue::enqueueNDRangeKernel()");

	

    // wait until computation is finished

    error = event.wait();

    checkErr(error, "event.wait()");

// read image buffer

    error = queue.enqueueReadBuffer(imageCL, CL_TRUE, 0, N*M*sizeof(GLubyte), &image);

    checkErr(error, "ComamndQueue::enqueueReadBuffer(image)");

May be you can help me to figure out this problem.

Greets

Henrik

PS:

I’m running Ubuntu 10.04 x64

on Q9450 (4 x 2,4 GHz) with 3962 MB RAM and GeForce 8600 GT (256 MB RAM, 32 CUDA cores)

CUDA is the latest version available

I already tried a very small picture with very few iterations but the error is still the same. That means for me that I do something wrong with handling the variables.

I already tried a very small picture with very few iterations but the error is still the same. That means for me that I do something wrong with handling the variables.

CL_OUT_OF_RESOURCES when waiting for a kernel to finish can mean that the kernel tried to read outside of allocated memory. Double-check that your buffers are the size that you assume them to be, and that the kernel is accessing addresses within the allocated size.

CL_OUT_OF_RESOURCES when waiting for a kernel to finish can mean that the kernel tried to read outside of allocated memory. Double-check that your buffers are the size that you assume them to be, and that the kernel is accessing addresses within the allocated size.

I changed the image to

/** image ***************************************************************************************************/

    unsigned char image2[M*N];

    int h = 0;

    for(int i = 0; i < M; i++)

	for(int j = 0; j < N; j++)

	{

	    image2[h] = image[i][j];

	    h++;

	}

    cl::Buffer imageCL( context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, M*N*sizeof(unsigned char), image2, &error);

    checkErr(error, "Buffer::outCL(image)");

/*****/

Now the buffer can be only that big as it is.

And the loops will stay between te borders.

Here is the CL C code:

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable

/*------------------------------------------------*/

/*   Calculate Iterations for Mandelbrot Set      */

/*------------------------------------------------*/

__kernel void mandelCalcCL( __global int * max,

			    __global float * m, __global float * n,	__global float * cx, __global float * cy, __global float * width, __global float * height,

			    __global unsigned char * image)

{

    int k;

    float x, y, v;

    float c0[2];

    float c[2];

    float d[2];

    int index = 0;	

	

    for(int j=0; j < (*m); j++) // y direction

    {

	for (int i=0; i < (*n); i++) // x direction 

	{

	    // starting point

			

	    x= i * ((*width)/(*n)) + (*cx)-(*width)/2;

	    y= j * ((*height)/(*m)) + (*cy)-(*height)/2;

			

	    c[0]=0;

	    c[1]=0;

		

	    c0[0]=x;

	    c0[1]=y;

			

	    // complex iteration

	    for(k=0; k<(*max); k++) // calculations for every point

	    {

		d[0]=c[0]*c[0]-c[1]*c[1];

		d[1]=c[0]*c[1]+c[1]*c[0];

				

		c[0]=d[0]+c0[0];

		c[1]=d[1]+c0[1];

				

		v = c[0]*c[0]+c[1]*c[1];

		if(v>4.0)

		    break; // assume not in set if mag > 4

	    }

			

	    if(k >= (*max)) 

	        k = 0;

	    image[index++] = 255 * k;

	}

    }

}

I changed the image to

/** image ***************************************************************************************************/

    unsigned char image2[M*N];

    int h = 0;

    for(int i = 0; i < M; i++)

	for(int j = 0; j < N; j++)

	{

	    image2[h] = image[i][j];

	    h++;

	}

    cl::Buffer imageCL( context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, M*N*sizeof(unsigned char), image2, &error);

    checkErr(error, "Buffer::outCL(image)");

/*****/

Now the buffer can be only that big as it is.

And the loops will stay between te borders.

Here is the CL C code:

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable

/*------------------------------------------------*/

/*   Calculate Iterations for Mandelbrot Set      */

/*------------------------------------------------*/

__kernel void mandelCalcCL( __global int * max,

			    __global float * m, __global float * n,	__global float * cx, __global float * cy, __global float * width, __global float * height,

			    __global unsigned char * image)

{

    int k;

    float x, y, v;

    float c0[2];

    float c[2];

    float d[2];

    int index = 0;	

	

    for(int j=0; j < (*m); j++) // y direction

    {

	for (int i=0; i < (*n); i++) // x direction 

	{

	    // starting point

			

	    x= i * ((*width)/(*n)) + (*cx)-(*width)/2;

	    y= j * ((*height)/(*m)) + (*cy)-(*height)/2;

			

	    c[0]=0;

	    c[1]=0;

		

	    c0[0]=x;

	    c0[1]=y;

			

	    // complex iteration

	    for(k=0; k<(*max); k++) // calculations for every point

	    {

		d[0]=c[0]*c[0]-c[1]*c[1];

		d[1]=c[0]*c[1]+c[1]*c[0];

				

		c[0]=d[0]+c0[0];

		c[1]=d[1]+c0[1];

				

		v = c[0]*c[0]+c[1]*c[1];

		if(v>4.0)

		    break; // assume not in set if mag > 4

	    }

			

	    if(k >= (*max)) 

	        k = 0;

	    image[index++] = 255 * k;

	}

    }

}

Well, I’m not using the C++ OpenCL binding, but the NDRange seems strange to me. Why there is multiplication with sizeof(GLubyte)? Shouldn’t it be as simple as cl::NDrange(N*M)? Why there isn’t any kernel code which will be dependent on get_global_id(0)? What exactly are you making in parallel?

Well, I’m not using the C++ OpenCL binding, but the NDRange seems strange to me. Why there is multiplication with sizeof(GLubyte)? Shouldn’t it be as simple as cl::NDrange(N*M)? Why there isn’t any kernel code which will be dependent on get_global_id(0)? What exactly are you making in parallel?

There is currently nothing I do in parallel.

And I tried it with just N*M - same issue.

There is currently nothing I do in parallel.

And I tried it with just N*M - same issue.

ok, then you can try out the AMD CPU implementation, run it under gdb on CPU and see where it goes wrong.

Also check if sizeof(GLubyte) is the same as sizeof(cl_uchar) on your platform and declare image before making the buffer.
cl_uchar* image = new cl_uchar[N*M];

Btw. why you are mixing float and integers in the for cyclus? that’s a bit weird. You can set simple kernel arguments (single variables) without the buffer.

ok, then you can try out the AMD CPU implementation, run it under gdb on CPU and see where it goes wrong.

Also check if sizeof(GLubyte) is the same as sizeof(cl_uchar) on your platform and declare image before making the buffer.
cl_uchar* image = new cl_uchar[N*M];

Btw. why you are mixing float and integers in the for cyclus? that’s a bit weird. You can set simple kernel arguments (single variables) without the buffer.

I’m using an Intel Q9450. How do I use the AMD CPU implementation? (OS is Ubuntu 10.04)

The implementation is not made by me. It’s a sample from the net. In normal C++ it runs perfectly. The OpenCL version is a test for my company.

How do I pass single variables without the buffter?

(I’m completely new with OpenCL)

I’m using an Intel Q9450. How do I use the AMD CPU implementation? (OS is Ubuntu 10.04)

The implementation is not made by me. It’s a sample from the net. In normal C++ it runs perfectly. The OpenCL version is a test for my company.

How do I pass single variables without the buffter?

(I’m completely new with OpenCL)

it’s simple :-)

Host code:

error = kernel.setArg(0, max);

GPU code

__kernel void mandelCalcCL( int max, ...)

Have you already tried to define and allocate the image buffer and still getting the same error?

it’s simple :-)

Host code:

error = kernel.setArg(0, max);

GPU code

__kernel void mandelCalcCL( int max, ...)

Have you already tried to define and allocate the image buffer and still getting the same error?

The setting of the single parameters without buffer works.

What do you mean with the last?

Where should I allocate the image buffer?

In the CL C code or in the host application?

The setting of the single parameters without buffer works.

What do you mean with the last?

Where should I allocate the image buffer?

In the CL C code or in the host application?