A weird behaviour of atomicCAS()

Hi everyone,

I am running the below program that tests atomicCAS()… I am expecting that only one thread will find the value at pointer 0 and thus change value to 100, but for some reason it seems that all threads are finding the contents at pointer as 0. Does anybody know why or what is the mistake?

Regards
Daniel


#include <cuda_runtime.h>
#include
#include <cuComplex.h>
using namespace std;

global void set(int *pointer)
{
*pointer=0;

*(pointer+threadIdx.x+1)=0;
}
global void kernel(int * pointer)
{

*(pointer+threadIdx.x+1)=atomicCAS(pointer,0,100);
}

int main(int argc,char ** argv)
{
int numThreads=40;
cudaEvent_t ready;

dim3 threadsPerBlock;
 dim3 blocks;
int o[numThreads+1];

int * pointer;

cudaMalloc(&pointer,sizeof(int)(numThreads+1));
cudaMemset(pointer,0,sizeof(int)
(numThreads+1));
threadsPerBlock.x=numThreads;
threadsPerBlock.y=1;
threadsPerBlock.z=1;
blocks.x=1;
blocks.y=1;
blocks.z=1;

set<<<threadsPerBlock,blocks>>> (pointer);
kernel <<<threadsPerBlock,blocks>>> (pointer);
cudaEventCreate(&ready);
cudaEventRecord(ready,0);
cudaEventSynchronize(ready);

cudaMemcpy(o,pointer,sizeof(int)*(numThreads+1),cudaMemcpyDeviceToHost);

for (int i=0;i<numThreads+1;i++)
cout << o[i] << " ";
cout << endl;

}


When running this up I get

root@…]# nvcc -arch=sm_11 atomiccas.cu
[root@…]# ./a.out
100 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

While I expect something like

100 0 100 100 100 100 100 100 …

a

Hi,
You just mixed-up the order of parameters “threadsPerBlock” and “blocks” in you code. Once this fixed, the result should be as expected.