CL_INVALID_WORK_GROUP_SIZE with clEnqueueNDRangeKernel

is this a known bug? here is my usage

int dim = 2;
size_t globalWorkSizeData[2] = {16,16};
size_t localWorkSizeData[2] = {8,8};
clEnqueueNDRangeKernel(clQueue, clKernel, dim,
NULL,
(const size_t *) globalWorkSizeData,
(const size_t *) localWorkSizeData,
0, NULL, NULL);

only if localWorkSizeData is NULL does the code work

thanks

I am experiencing exactly the same problem: as long as I put a non-NULL parameter for work-group size, OpenCL will quit with CL_INVALID_WORK_GROUP_SIZE. It is hard to believe this is a bug though, since almost every nvidia opencl programmer should have encountered it.

Any explanations?

I want to add that the maxWorkGroupSize returned from clGetDeviceInfo() is 512, and kernelWorkGroupSize returned from clGetKernelWorkGroupInfo() is 128. The previous error happened for any workgroup size smaller than 128.

Having anything but NULL as local work size gives an error -54.

Are there any known solutions? It has been a few months…

error -54 is the same as CL_INVALID_WORK_GROUP_SIZE.

If I set the work-group size to NULL, I got (-5)CL_OUT_OF_RESOURCES error :(
from the best practices guide, shouldn’t a “NULL” work-group make opencl to automatically determine a work-group?

Which version are you using. I am using 3.0_beta on Windows 64-bit. And if I set the size to anything but NULL I get CL_INVALID_WORK_GROUP_SIZE error

mine is driver 190.29 on Ubuntu 9.10, cuda 2.3

using the profiler, I realized that setting NULL for worksize gave me a 1x1x1 block, which completely ruined my kernel’s occupancy :(

finally, I figured out why I am getting the CL_INVALID_WORK_GROUP_SIZE error. My nvidia card was installed in a 64bit Ubuntu, while, for 64bit systems, sizeof(size_t) is actually 8 [1], and it is different from sizeof(int), which is 4. When I passed my global/block sizes, I used (size_t *)(&an_int), which results in unpredicted values for the kernel dimensions. The reason why the code worked for ATI hardware is because it was installed in a 32bit system.

So, check your code and see if you have the same issue.

[1] http://www.toymaker.info/Games/html/64_bit.html

Same error for me but frome a different reason: the globalWorkSize wasn’t divisable by the localWorkSize whereas i naively thought the driver would adjuste the size properly from the input …

What is the solution? I’m confused…

should I use

size_t localWorkSize[2];

(unsigned int*)localWorkSize ?

That does not compile.

What is the solution? I’m confused…

should I use

size_t localWorkSize[2];

(unsigned int*)localWorkSize ?

That does not compile.

I also encounter this problem, but finally I found the source is the wrong memory size. So don’t focus on the group size, It probably result from the memory transfer between CPU and GPU, and the problem for memory transfer is the wrong size, just a tip. hopefully it helps