atomicAdd occasionally fails on doubles

Hello! I’ve found a strange behavior of atomic operation, device is Tesla C2070, compute capability 2.0, OS is Ubuntu 11.04,64-bit, device driver is 270.41.19.
Threads write concurrently as:


double k=…;
double tmpStack[6416];
int i=…, width=…, N=…;
for (int r=0; r<N; r++) {
for (int c=0; c<N; c++) {
atomicAdd((double
)&result[r*width+c],tmpStack[i*N*N+r*N+c]*k);

}
}

A block is dim(32,1) for example. The problem is that sometimes the kernel fails with “Unspecified error” exactly on the atomicAdd function. The function is taken from CUDA documentation:

device double atomicAdd(double* address, double val)
{
double old = address, assumed;
do {
assumed = old;
old =__longlong_as_double(atomicCAS((unsigned long long int
)address,__double_as_longlong(assumed),__double_as_longlong(val + assumed)));
} while (assumed != old);
return old;
}

My question: are there some undocumented special requirement for using atomic operations? Or a limit of concurrently writing threads?
Thanks, any help is appreciated.

I believe double precision atomicAdd is unsupported yet.

It’s supported if you add the described function.
Never mind, the failure was caused by a bug in my code that happened much earlier in another part of memory but somehow hit only during atomic operation.

Thanks for response.

Sorry, I missed that…