CommandQueue bug? 256kb limit in 1D buffer?

anlmat · July 20, 2010, 5:25pm

Hello there!

I have been having issues with having larger than 256kb 1D buffers. When the buffer is smaller I get no exceptions and the calculations seem correct. But when I set the buffer size bigger than 256kb the program below outputs the following:

Platform number is: 1

Platform is by: NVIDIA Corporation

std::vector inVec has size 2048000

terminate called after throwing an instance of ‘cl::Error’

what(): clFinish

Aborted

It is the statement “queue.finish()” in the source code that throws an exception. I have also tried using “event.wait()” with a similar result. Am I doing something terrebly wrong in the code? Or is it a bug in the drivers? opencl? Or perhaps the C++ header(I looked in there but it looks ok to me)? I would greatly appreciate any help or pointers on how to fix this.

My system:

Arch Linux i686, NVIDIA driver 256.35, OpenCL 1.0, GTX 460.

The complete code is included below.

Thanks!

Regards,

anlmat

kernels:

[codebox]__kernel void simple_add(__global int *in)

{

size_t tid = get_global_id(0);

in[tid] = in[tid]+2;

}

__kernel void simple_sub(__global int *in)

{

size_t tid = get_global_id(0);

in[tid] = in[tid]-1;

}[/codebox]

main.cpp:

[codebox]#define __CL_ENABLE_EXCEPTIONS

#include

#include <CL/cl.hpp>

#include

inline void

checkErr(cl_int err, const char * name)

{

if (err != CL_SUCCESS) {

    std::cerr << "ERROR: " << name

             << " (" << err << ")" << std::endl;

    exit(EXIT_FAILURE);

}

}

int

main(void)

{

cl_int err;

std::vector< cl::Platform > platformList;

cl::Platform::get(&platformList);

checkErr(platformList.size()!=0 ? CL_SUCCESS : -1, "cl::Platform::get");

std::cerr << "Platform number is: " << platformList.size() << std::endl;

std::string platformVendor;

platformList[0].getInfo((cl_platform_info)CL_PLATFORM_VENDOR

, &platformVendor);

std::cerr << "Platform is by: " << platformVendor << "\n";

cl_context_properties cprops[3] =

    {CL_CONTEXT_PLATFORM, (cl_context_properties)(platformList[0])(), 0};

cl::Context context(CL_DEVICE_TYPE_GPU,cprops,NULL,NULL,&err);

checkErr(err, "Conext::Context()");

std::vector inVec(512000,50);

size_t vec_size = inVec.size()*sizeof(int);

std::cout << "std::vector<int> inVec has size " << vec_size << std::endl;

cl::Buffer devBuf(context, CL_MEM_READ_WRITE, vec_size, &inVec[0], &err);

checkErr(err, "Buffer::Buffer()");

std::vectorcl::Device devices;

devices = context.getInfo<CL_CONTEXT_DEVICES>();

checkErr(devices.size() > 0 ? CL_SUCCESS : -1, "devices.size() > 0");

std::ifstream file(“…/opencl_test/test_kernels.cl”);

checkErr(file.is_open() ? CL_SUCCESS:-1, "test_kernels.cl");

std::string prog(std::istreambuf_iterator(file),

                 (std::istreambuf_iterator<char>()));

cl::Program::Sources source(1, std::make_pair(prog.c_str(), prog.length()+1));

cl::Program program(context, source);

err = program.build(devices,"");

checkErr(file.is_open() ? CL_SUCCESS : -1, "Program::build()");

cl::Kernel simple_add_kernel(program, “simple_add”, &err);

cl::Kernel simple_sub_kernel(program, "simple_sub", &err);

cl::CommandQueue queue(context, devices[0], 0, &err);

cl::Event event;

err = queue.enqueueWriteBuffer(devBuf, CL_TRUE, 0, vec_size, &inVec[0]);

err = simple_add_kernel.setArg(0, devBuf);

err = simple_sub_kernel.setArg(0, devBuf);

err = queue.enqueueNDRangeKernel(simple_add_kernel, cl::NullRange, cl::NDRange(vec_size), cl::NDRange(32), NULL, &event);

err = queue.enqueueNDRangeKernel(simple_sub_kernel, cl::NullRange, cl::NDRange(vec_size), cl::NDRange(32), NULL, &event);

checkErr(err, “CommandQueue::enqueueNDRangeKernel()”);

queue.finish();

err = queue.enqueueReadBuffer(devBuf, CL_TRUE, 0, vec_size, &inVec[0]);

checkErr(err, "CommandQueue::enqueueReadBuffer()");

std::cout << "\n" << inVec[0];

return EXIT_SUCCESS;

}

[/codebox]

HolyGeneralK · July 20, 2010, 9:08pm

I seem to be running into the same issue - although my buffer size seems a little more limited than yours (mine breaks at about 186kb). My work group sizes and work items sizes are all well within what that kernel and devices both state are the upper limits.

anlmat · July 20, 2010, 9:17pm

Hello,

I just found the bugs in the code. I had declared vec_size = inVec.size()*sizeof(int) and used this in the “queue.enqueueNDRangeKernel” calls near the end of the source code. The “queue.enqueueNDRangeKernel” does not want the size in bytes but the length/number of items of the vector.

Is this the same problem that you are having?

Regards,

anlmat

PS.

Is there a way to change the topic to [SOLVED]?

Topic		Replies	Views
CL_INVALID_COMMAND_QUEUE error on clFinish command - a lot of operations in each kernel driver crash CUDA Programming and Performance	3	8506	September 1, 2011
Problem with vector addition example The program doesn't work the way described CUDA Programming and Performance	0	725	October 4, 2011
Kernel fails, no errors or explanation Smaller kernel runs fine CUDA Programming and Performance	3	1275	April 13, 2011
OpenCL API of clCreateBuffer() does not work as expected in a abnormal case CUDA Programming and Performance	2	878	February 20, 2019
OpenCL CL_INVALID_COMMAND_QUEUE issue CUDA Programming and Performance	1	1240	July 5, 2017
Kernel execution fails with error CL_OUT_OF_RESOURCES HELP CUDA Programming and Performance	3	11118	January 28, 2010
CL_OUT_OF_RESOURCES error on clEnqueueReadBuffer Driver crashes and I get an CL_OUT_OF_RESOURCES err CUDA Programming and Performance	2	12788	February 25, 2011
Very simple OpenCL kernel failed with CL_INVALID_COMMAND_QUEUE Nsight Visual Studio Edition	1	1152	July 6, 2017
CL_OUT_OF_RESOURCES on clEnqueueReadBuffer CUDA Programming and Performance	13	19286	February 27, 2011
Error in clEnqueueNDRangeKernel() CUDA Programming and Performance	1	6694	March 18, 2010

CommandQueue bug? 256kb limit in 1D buffer?

Related topics