atomicAdd occasionally fails on doubles

L_F · October 11, 2011, 9:52pm

Hello! I’ve found a strange behavior of atomic operation, device is Tesla C2070, compute capability 2.0, OS is Ubuntu 11.04,64-bit, device driver is 270.41.19.
Threads write concurrently as:

…
double k=…;
double tmpStack[6416];
int i=…, width=…, N=…;
for (int r=0; r<N; r++) {
for (int c=0; c<N; c++) {
atomicAdd((double)&result[r*width+c],tmpStack[i*N*N+r*N+c]*k);
…
}
}
…

A block is dim(32,1) for example. The problem is that sometimes the kernel fails with “Unspecified error” exactly on the atomicAdd function. The function is taken from CUDA documentation:

device double atomicAdd(double* address, double val)
{
double old = address, assumed;
do {
assumed = old;
old =__longlong_as_double(atomicCAS((unsigned long long int)address,__double_as_longlong(assumed),__double_as_longlong(val + assumed)));
} while (assumed != old);
return old;
}

My question: are there some undocumented special requirement for using atomic operations? Or a limit of concurrently writing threads?
Thanks, any help is appreciated.

benetion · October 12, 2011, 6:09pm

I believe double precision atomicAdd is unsupported yet.

Hello! I’ve found a strange behavior of atomic operation, device is Tesla C2070, compute capability 2.0, OS is Ubuntu 11.04,64-bit, device driver is 270.41.19.

Threads write concurrently as:

…

double k=…;

double tmpStack[64*16];

int i=…, width=…, N=…;

for (int r=0; r<N; r++) {

for (int c=0; c<N; c++) {
atomicAdd((double*)&result[r*width+c],tmpStack[i*N*N+r*N+c]*k);

...
}

}

…

A block is dim(32,1) for example. The problem is that sometimes the kernel fails with “Unspecified error” exactly on the atomicAdd function. The function is taken from CUDA documentation:

device double atomicAdd(double* address, double val)

{

double old = *address, assumed;

do {
assumed = old;

old =__longlong_as_double(atomicCAS((unsigned long long int*)address,__double_as_longlong(assumed),__double_as_longlong(val + assumed)));
} while (assumed != old);

return old;

}

My question: are there some undocumented special requirement for using atomic operations? Or a limit of concurrently writing threads?

Thanks, any help is appreciated.

L_F · October 12, 2011, 8:21pm

It’s supported if you add the described function.
Never mind, the failure was caused by a bug in my code that happened much earlier in another part of memory but somehow hit only during atomic operation.

Thanks for response.

benetion · October 12, 2011, 9:10pm

Sorry, I missed that…

Topic		Replies	Views
Try to implement a AtomicAddDouble function CUDA Programming and Performance	7	2517	January 31, 2011
Why does atomicAdd not work with doubles as input? CUDA Programming and Performance	6	14054	December 21, 2017
how to do atomic add using double precision CUDA Programming and Performance	0	5281	October 6, 2011
Atomic Add 1.3 Float/Double Atomic Add CUDA Programming and Performance	4	2447	December 17, 2010
Atomic Functions for double precision CUDA Programming and Performance	2	3498	June 8, 2009
Double variable AtomicAdd CUDA Programming and Performance	3	2830	January 10, 2015
atomicadd double memory error CUDA Programming and Performance	1	1204	March 9, 2016
atomic add operation CUDA Programming and Performance	2	4473	July 22, 2014
Atomic Add with Doubles OptiX	3	3595	June 14, 2022
atomicAdd problems. CUDA Programming and Performance	3	2389	April 13, 2011

atomicAdd occasionally fails on doubles

Related topics