CommandQueue bug? 256kb limit in 1D buffer?

Hello there!

I have been having issues with having larger than 256kb 1D buffers. When the buffer is smaller I get no exceptions and the calculations seem correct. But when I set the buffer size bigger than 256kb the program below outputs the following:


Platform number is: 1

Platform is by: NVIDIA Corporation

std::vector inVec has size 2048000

terminate called after throwing an instance of ‘cl::Error’

what(): clFinish

Aborted


It is the statement “queue.finish()” in the source code that throws an exception. I have also tried using “event.wait()” with a similar result. Am I doing something terrebly wrong in the code? Or is it a bug in the drivers? opencl? Or perhaps the C++ header(I looked in there but it looks ok to me)? I would greatly appreciate any help or pointers on how to fix this.

My system:

Arch Linux i686, NVIDIA driver 256.35, OpenCL 1.0, GTX 460.

The complete code is included below.

Thanks!

Regards,

anlmat

kernels:

[codebox]__kernel void simple_add(__global int *in)

{

size_t tid = get_global_id(0);

in[tid] = in[tid]+2;

}

__kernel void simple_sub(__global int *in)

{

size_t tid = get_global_id(0);

in[tid] = in[tid]-1;

}[/codebox]

main.cpp:

[codebox]#define __CL_ENABLE_EXCEPTIONS

#include

#include <CL/cl.hpp>

#include

#include

#include

#include

#include

#include

#include

inline void

checkErr(cl_int err, const char * name)

{

if (err != CL_SUCCESS) {

    std::cerr << "ERROR: " << name

             << " (" << err << ")" << std::endl;

    exit(EXIT_FAILURE);

}

}

int

main(void)

{

cl_int err;

std::vector< cl::Platform > platformList;

cl::Platform::get(&platformList);

checkErr(platformList.size()!=0 ? CL_SUCCESS : -1, "cl::Platform::get");

std::cerr << "Platform number is: " << platformList.size() << std::endl;

std::string platformVendor;

platformList[0].getInfo((cl_platform_info)CL_PLATFORM_VENDOR

, &platformVendor);

std::cerr << "Platform is by: " << platformVendor << "\n";

cl_context_properties cprops[3] =

    {CL_CONTEXT_PLATFORM, (cl_context_properties)(platformList[0])(), 0};

cl::Context context(CL_DEVICE_TYPE_GPU,cprops,NULL,NULL,&err);

checkErr(err, "Conext::Context()");

std::vector inVec(512000,50);

size_t vec_size = inVec.size()*sizeof(int);

std::cout << "std::vector<int> inVec has size " << vec_size << std::endl;

cl::Buffer devBuf(context, CL_MEM_READ_WRITE, vec_size, &inVec[0], &err);

checkErr(err, "Buffer::Buffer()");

std::vectorcl::Device devices;

devices = context.getInfo<CL_CONTEXT_DEVICES>();

checkErr(devices.size() > 0 ? CL_SUCCESS : -1, "devices.size() > 0");

std::ifstream file(“…/opencl_test/test_kernels.cl”);

checkErr(file.is_open() ? CL_SUCCESS:-1, "test_kernels.cl");

std::string prog(std::istreambuf_iterator(file),

                 (std::istreambuf_iterator<char>()));

cl::Program::Sources source(1, std::make_pair(prog.c_str(), prog.length()+1));

cl::Program program(context, source);

err = program.build(devices,"");

checkErr(file.is_open() ? CL_SUCCESS : -1, "Program::build()");

cl::Kernel simple_add_kernel(program, “simple_add”, &err);

cl::Kernel simple_sub_kernel(program, "simple_sub", &err);

cl::CommandQueue queue(context, devices[0], 0, &err);

cl::Event event;

err = queue.enqueueWriteBuffer(devBuf, CL_TRUE, 0, vec_size, &inVec[0]);

err = simple_add_kernel.setArg(0, devBuf);

err = simple_sub_kernel.setArg(0, devBuf);

err = queue.enqueueNDRangeKernel(simple_add_kernel, cl::NullRange, cl::NDRange(vec_size), cl::NDRange(32), NULL, &event);

err = queue.enqueueNDRangeKernel(simple_sub_kernel, cl::NullRange, cl::NDRange(vec_size), cl::NDRange(32), NULL, &event);

checkErr(err, “CommandQueue::enqueueNDRangeKernel()”);

queue.finish();

err = queue.enqueueReadBuffer(devBuf, CL_TRUE, 0, vec_size, &inVec[0]);

checkErr(err, "CommandQueue::enqueueReadBuffer()");

std::cout << "\n" << inVec[0];

return EXIT_SUCCESS;

}

[/codebox]

I seem to be running into the same issue - although my buffer size seems a little more limited than yours (mine breaks at about 186kb). My work group sizes and work items sizes are all well within what that kernel and devices both state are the upper limits.

Hello,

I just found the bugs in the code. I had declared vec_size = inVec.size()*sizeof(int) and used this in the “queue.enqueueNDRangeKernel” calls near the end of the source code. The “queue.enqueueNDRangeKernel” does not want the size in bytes but the length/number of items of the vector.

Is this the same problem that you are having?

Regards,

anlmat

PS.

Is there a way to change the topic to [SOLVED]?