Problems after atomicAdd

Hi!

Never mind that atomic operations are slow and it’s surely more elegant to use a different implementation. Nevertheless I wanted to try the implementation via atomic add, as it’s easiest and I wanted to test how slow it would be.

I implemented the atomicAdd for double values as described in this forum

__device__ inline void myAtomicAdd(double *address, double value)  

{

    unsigned long long oldval, newval, readback;

oldval = __double_as_longlong(*address);

    newval = __double_as_longlong(__longlong_as_double(oldval) + value);

    while ((readback=atomicCAS((unsigned long long *)address, oldval, newval)) != oldval)

    {

        oldval = readback;

        newval = __double_as_longlong(__longlong_as_double(oldval) + value);

    }

}

I want to implement a gridding, where different threads have to add up weighted values at the same address.

Now IF it works the atomic operation works fine and I get reproducible results.

BUT sometimes - not in every run of the program - the GPU gives me an “unknown error” when I try to copy the result back to CPU. Why is that? Any ideas?