atomicCAS() doesn't compile!

I trying to use atomic operation in my kernel, but when I try to compile it Visual Studio say:

" error: identifier “atomicCAS” is undefined "

The code is simple:

[codebox]if (tid==0) {

	do {} while(atomicCAS(&lock,0,1)); // set lock

	square_norm += svhat[0];

	__threadfence(); // wait for write completion

	lock = 0; // free lock

}	[/codebox]

Can someone help me to solve this problem?

Thanks

Most likely this is because you are not passing the -arch sm_11 option to nvcc. I’m not familiar with Visual Studio, so I don’t know how you add it. (By default, nvcc compiles code that runs on every CUDA device, and the first generation did not support atomic operations.)

workaround: use macro CUDA_ARCH .

look at section 3.1.4 of programming guide

x.cu can have an optimized code path that uses atomic operations, for example, which are only supported in devices of compute capability 1.1 and higher. The CUDA_ARCH macro can be used to differentiate various code paths based on compute capability. It is only defined for device code. When compiling with “arch=compute_11” for example, CUDA_ARCH is equal to 110

Thanks guys, now the code compile…but the atomiCAS() seems not working :( … this is the code:

[codebox]device int lock=0;

device_ float square_norm=0;

global mykernel(…){

if(tid==0){

d_sum+=temp[0];

do{}while(atomicCAS(&lock,0,1));//setlock

square_norm += temp[0]

__threadfence();//waitforwritecompletion

lock=0;//freelock

}

}

[/codebox]

It should sum elements of the temp array after a reduction…Can you find any mistakes on the code?

Thanks

I’ve just learned from the “CUDA by example”, that the atomic lock can just be designed like this:

struct Lock {

  int *mutex;

  Lock(void) {

	  int state=0;

	  cudaMalloc((void **)&mutex,sizeof(int));

	  cudaMemcpy(mutex, &state, sizeof(int), cudaMemcpyHostToDevice);

  }

  ~Lock(void) {

	  cudaFree(mutex);

  }

  __device__ void lock(void)

  {

		while( atomicCAS(mutex, 0, 1) !=0  )

		  ;

  }

  __device__ void unlock(void)

  {

	   atomicExch(mutex,0);

  }

}

I’m going to try it in my project ~ Share it with u, and wish it useful as well~

I’ve just learned from the “CUDA by example”, that the atomic lock can just be designed like this:

struct Lock {

  int *mutex;

  Lock(void) {

	  int state=0;

	  cudaMalloc((void **)&mutex,sizeof(int));

	  cudaMemcpy(mutex, &state, sizeof(int), cudaMemcpyHostToDevice);

  }

  ~Lock(void) {

	  cudaFree(mutex);

  }

  __device__ void lock(void)

  {

		while( atomicCAS(mutex, 0, 1) !=0  )

		  ;

  }

  __device__ void unlock(void)

  {

	   atomicExch(mutex,0);

  }

}

I’m going to try it in my project ~ Share it with u, and wish it useful as well~

hello, now i have the same error with compiling :) how did you fix it? i mean i am using visual studio and i couldnt understand the solutions that helped you, so can you explain what did you do to fix it in visual studio?

Open project properties (Alt-F7). Go to CUDA C/C++, Device and change Code Generation to “compute_11,sm_11”.