Why the 64bit addAtomic error?

from the snippet below, can anyone please tell me why the second atomicAdd raises misalignment errors? I’m on a 6.1 compute GPU, so the 64 bit atomic add should be ok. (vs2015sp3, win7)

global void why(
double *dd,
float *ff

float thingFloat = 3.0f;
atomicAdd(ff, thingFloat);    // works fine

double thingDouble = 3.0f;
atomicAdd(dd, thingDouble);   // raises misaligned atomic error


double *dd;
cudaMalloc((void **)&dd, 100 * sizeof(double));
float *ff;
cudaMalloc((void **)&ff, 100 * sizeof(float));

why <<<1, 1 >>> (dd, ff);


does it also raise this error if you run the compiled executable outside of visual studio, i.e. at the windows command prompt, using cuda-memcheck ?

add printing of variable address, in both functions, to check it yourself

Also, I would always recommend adding a cudaDeviceSynchronize() after the kernel call in a test code like that.

As written, your code will allow application tear-down to begin while the kernel is still executing. This can lead to unpredictable results.

I ran your code on linux, CUDA 8.0.61 and can’t reproduce any issues. Other than the comment about cudaDeviceSynchronize(), I don’t believe there should be any issues with your code.

What flags do you use for compiling?