Hello! I’ve found a strange behavior of atomic operation, device is Tesla C2070, compute capability 2.0, OS is Ubuntu 11.04,64-bit, device driver is 270.41.19.
Threads write concurrently as:
…
double k=…;
double tmpStack[6416];
int i=…, width=…, N=…;
for (int r=0; r<N; r++) {
for (int c=0; c<N; c++) {
atomicAdd((double)&result[r*width+c],tmpStack[i*N*N+r*N+c]*k);
…
}
}
…
A block is dim(32,1) for example. The problem is that sometimes the kernel fails with “Unspecified error” exactly on the atomicAdd function. The function is taken from CUDA documentation:
device double atomicAdd(double* address, double val)
{
double old = address, assumed;
do {
assumed = old;
old =__longlong_as_double(atomicCAS((unsigned long long int)address,__double_as_longlong(assumed),__double_as_longlong(val + assumed)));
} while (assumed != old);
return old;
}
My question: are there some undocumented special requirement for using atomic operations? Or a limit of concurrently writing threads?
Thanks, any help is appreciated.
It’s supported if you add the described function.
Never mind, the failure was caused by a bug in my code that happened much earlier in another part of memory but somehow hit only during atomic operation.