atomicAdd() failed nvcc compiles but module fails to load

This is my kernel

__device__ void AddOne(int *acc)

{

		atomicAdd(acc, 1);

}

extern "C"

__global__ void test(int* var)

{

	__shared__ int acc;

	if(threadIdx.x==0)

		acc = 0;

	__syncthreads();

	AddOne(&acc);

	__syncthreads();

	if(threadIdx.x==0)

		var[0] = acc;

}

I compiled this using the following command and I had no errors or warnings while compiling using nvcc.

nvcc kernel.cu --ptx -arch sm_11

If I run this program, the module fails to load using cuModuleLoad(). Even the error returned isn’t any of the ones listed in the reference manual.

If I change the line

atomicAdd(acc, 1);

with

acc[0] = 1;

the module gets loaded correctly, and I am able to retrieve the value 1 from the kernel.

Can someone help me with this? Thanks for reading.

I have a 1.1 device and I am running 2.3 version of the toolkit.

Shared memory atomic operations are not supported on compute capability 1.1 hardware.

Right. Just saw it in the appendix. Is the atomic operation expensive to use, assuming I am using it on shared data?

I am running this code instead.

__device__ int acc;

__device__ void AddOne()

{

		atomicAdd(&acc, 1);

}

extern "C"

__global__ void test(int *var)

{

	if(threadIdx.x==0)

		acc = 0;

	__syncthreads();

	AddOne();

	__syncthreads();

	if(threadIdx.x==0)

		var[0] = acc;

}

I am invoking the kernel with a single block containing 31 threads.

When I do a ./a.out I get the right value. But next time I run a.out, I am getting 62. If I continue to call the app, it seems to be adding to old value of the variable from the previous invocation. The device var should have a lifetime of the app according to the guide. I am not sure how it is persisting across multiple invocations.

31 poona@poona_desktop:~/development/cuda# ./a.out

62 poona@poona_desktop:~/development/cuda# ./a.out

93 poona@poona_desktop:~/development/cuda# ./a.out

124 poona@poona_desktop:~/development/cuda# ./a.out

155 poona@poona_desktop:~/development/cuda# ./a.out

186 poona@poona_desktop:~/development/cuda# ./a.out

217 poona@poona_desktop:~/development/cuda# ./a.out

and so on.

Try declaring acc explicitly as global.

Can we declare a var as global? I tried it with

__global__ int acc;

and I got these errors

kernel_atomic.cu(1): warning: invalid attribute for variable "acc"

kernel_atomic.cu(6): error: identifier "acc" is undefined

kernel_atomic.cu(14): error: identifier "acc" is undefined

kernel_atomic.cu(23): error: identifier "acc" is undefined