Hi!
Never mind that atomic operations are slow and it’s surely more elegant to use a different implementation. Nevertheless I wanted to try the implementation via atomic add, as it’s easiest and I wanted to test how slow it would be.
I implemented the atomicAdd for double values as described in this forum
__device__ inline void myAtomicAdd(double *address, double value)
{
unsigned long long oldval, newval, readback;
oldval = __double_as_longlong(*address);
newval = __double_as_longlong(__longlong_as_double(oldval) + value);
while ((readback=atomicCAS((unsigned long long *)address, oldval, newval)) != oldval)
{
oldval = readback;
newval = __double_as_longlong(__longlong_as_double(oldval) + value);
}
}
I want to implement a gridding, where different threads have to add up weighted values at the same address.
Now IF it works the atomic operation works fine and I get reproducible results.
BUT sometimes - not in every run of the program - the GPU gives me an “unknown error” when I try to copy the result back to CPU. Why is that? Any ideas?