Atomic functions problem

Hello all!

I’m having a weird problem while trying to use the atomicAdd() function…This is my simple kernel:

__global__ void k1(int *out, unsigned int *index) {

	

	   if(threadId.x > 3) {

			   int resultIndex = atomicAdd(index, 1);

			   out[resultIndex] = 5;

	  }

}

This is just a test, where I want to implement a filter and output the results to an output buffer. The thing is that after calling this kernel from the host and copying the output buffer into the host memory, when I print the result it’s allways 0. But if I take the atomicAdd() line (and write to the threadIdx.x position in the array) it does print 5…

What’s going on here?

It may be something simple like the *index pointer being a host pointer by accident, or accidentally using a block size of 1 wide and 64 high instead of the converse.

Can you post the 5-10 lines of code you use to allocate and copy your memory, call the kernel, then copy and print the results?

Does your device support atomics?

Is value under *index reset to 0 at the beginning? :)

This is the code where I allocate memory on the host:

unsigned int *index;

cudaMalloc((void**) index, sizeof(*index));

cudaMemcpy(index, 0, sizeof(int), cudaMemcpyHostToDevice);

How can I check my device compute capability?

cudaMemcpy(index, 0, sizeof(int), cudaMemcpyHostToDevice);

This copies an int from address 0 to index. At address 0 you have probably some garbage.

Surprised this didn’t raise a segmentation fault, but maybe cudaMemcpy operates on some higher priviledge level which bypass normal OS securities.

Try this:

unsigned int cpuValue=0;

cudaMemcpy(index, &cpuValue, sizeof(int), cudaMemcpyHostToDevice);

Regarding your Compute Capability - what graphics card do you have?

Use the deviceQuery program in the SDK. It prints out the compute capability of each device (among other useful information).

Thanks! This was the problem…I thought there was no need to allocate memory when copying a static value…

Thanks all, problem solved!

This should also work:

cudaMemset (index, 0, sizeof(int));

yet I seldom use it - it is good for clearing data but not for setting some more complex initial values.