Atomics bugs

Hi guys,

Is anybody else having trouble creating programs that use atomic operations? The OpenCL announcement/download page here:

http://developer.nvidia.com/object/opencl-download.html

seems to indicate that atomics support is already there…

A simple kernel that uses atomics crashes my test program with a VisualC Runtime Error in clBuildProgram, with the following message:

[codebox]

Constant expressions not handled yet

i32 ptrtoint (i32 addrspace(3)* @shr_1_t to i32)

[/codebox]

I distilled a larger program I had to a simplified, unit test-like kernel that still exhibits the problem:

[codebox]

#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable

__kernel void hello(__global char * out) {

__local int t;

size_t tid = get_global_id(0);

t = 0;

atom_add(&t, 32);

}

[/codebox]

Here’s the source code to the main program:

[codebox]

int main(void) {

cl_context gpu_context;

cl_int error_code;

cl_program program;

size_t kernelLength;

char *kernelSource;

gpu_context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL,

                                    &error_code);

kernelSource = oclLoadProgSource("hello_kernel.cl", 

								  "",

                                  &kernelLength);

program = clCreateProgramWithSource(gpu_context,

                                    1,

                                    (const char **)&kernelSource,

                                    &kernelLength, &error_code);

error_code = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);

}

[/codebox]

When I comment out the atomic add, the program runs fine (and the larger version that actually did some stuff did too).

My environment is Vista x32, NVidia Driver 190.89, 2x GTX 260, Intel i7.

Thoughts?

Yup I see the same thing on my Quadro FX 5800, if I query CL_DEVICE_EXTENSIONS with clGetDeviceInfo I get this:
cl_khr_byte_addressable_store cl_nvcompiler_options cl_nv_device_attribute_query cl_khr_global_int32_base_atomicsl_khr_global_int32_extended_
atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics

But if I try and use atom_add on a local I get the same error as you irregardless of whether I enable those extensions via a pragma or not. Atomic adds on global pointer work fine, however.

I’m also unable to get any case with an atomic on a shared memory variable to work. I suspect I’m not reading the syntax correctly.
Anyone know of an example?

Okay, so I realized that the syntax above, although correct as per OpenCL spec, is unlike the syntax used in OpenCL SDK examples.

So I changed my code to be as follows:

[codebox]

#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable

__kernel void hello(__global char * out, __local int *t) {

*t = 0;

atom_add(t, 32);

}

[/codebox]

In the .cpp code, I follow what is done in the examples:

[codebox]

errorcode = clSetKernelArg(helloKernel, 1, sizeof(cl_int), NULL);

[/codebox]

Now, the code is structured exactly as the code in the examples.

However, when building the .cl program compiler gives the following error:

State space incorrect for instruction ‘atom’

Can you suggest a workaround? Is there any guidance as to when the next version of OpenCL SDK will be released that might have local-space atomic instructions working?

Thanks!

Stan

Running win7 64, driver version 191.07 on Geforce GTX 275, cl_khr_local_int32_base_atomics and cl_khr_local_int32_extended_atomics are reported as present.
Haven’t used them yet, only the global variant. In that case it proved imperative to specify the address space when using a pointercast. Still, I didn’t succeed using the useless 4th element of a __global float4 for atomics.
Further, I tried to use bits by oring and xoring/anding, but when I go beyond bytes, my code will occasionally/regularly crash the driver. At one point I had both global base atomics and global extended atomics enabled, which apparently gives trouble, so use only one, assuming that extended also enables base.
But, to reply to your question, signs are that local atomics are around. Only I am confused what to understand under local in this case. It could be that automatic variables, for one thing, are not local in the sense of being addressable by pointers as required in atomic operations.
Under Cuda, local memory is a slow but cached kind of memory. Under OpenCL it appears to be either constant memory or shared memory, local meaning particular to the workgroup. This is my interpretation, anyone, correct me if I’m wrong.
If I am rougly correct, “local atomics” could be a bit misleading.
Jan

“Local atomics” means atomic operations on __local data.

Under OpenCL, __local refers to memory shared among the threads in a work group, and it should be pretty fast, nearly as fast as registers.

Stan