Why does atomicAdd not work with doubles as input?

I’m trying to sum a vector down to one element using multiple threads which results in a race condition. To avoid it I’m trying to use atomicAdd during the accumulation. However, I can’t get it to work using double precision numbers.

When I compile my code (from Matlab) I get an error:

No instance of overloaded function "atomicAdd" matches the argument list argument types are: (double *, double)

I’m using cuda 8 with a GTX 1080ti so as I understand it atomicAdd should be supported with double precision.
This is the way Im trying to run it in my kernel:

__global__ void Accumulate(double* deviceSSE, int m) {

	int index = threadIdx.x;
	int stride = blockDim.x;

	for (int rows = index; rows < m; rows += stride) {

		atomicAdd(deviceSSE, 1.25);
	}
}

Could someone please explain what I need to do in order to get it to work? According to the documentation it looks to me like Im giving it the correct input. One double pointer and one double value:

double atomicAdd(double* address, double val);

“As of CUDA 8, double-precision atomicAdd() is implemented in CUDA with hardware support in SM_6X (Pascal) GPUs.”

So make sure you compile the code for sm_60 (Tesla P100) or sm_61 (consumer Pascal) or sm_70 (Tesla V100, Titan V) target architecture.

Christian

Is it enough to enter -arch=sm_61 when compiling to make that happen, or am I missing some flag?

This is how I’m compiling it from Matlab at the moment:

mexcuda AtomicAddTest.cu NVCCFLAGS=-arch=sm_61

And its still producing the same error.

Some wild guesswork follows:

try this instead of NVCCFLAGS maybe? and possibly give this in front of the .cu file names to compile

–gpu-architecture=compute_61 --gpu-code=sm_61

Alternatively, if NVCCFLAGS is supposed to be given as an environment variable, then try defining its contents to be -gencode=arch=compute_61,code=sm_61

e.g. with this command from within Matlab ( see https://de.mathworks.com/help/matlab/ref/setenv.html )

setenv(‘NVCCFLAGS’, ‘-gencode=arch=compute_61,code=sm_61’);

No, sadly that just gave me an:

Unknown MEX argument '--gpu-architecture=compute_61'.

error. I suspect the flags needs to be after NVCCFLAGS in order for matlabs mexcuda not to reject it.

It tries to compile if I use the commmand you mentioned:
setenv(‘NVCCFLAGS’, ‘-gencode=arch=compute_61,code=sm_61’);
But again I get the same error about atomicAdd as before.

similar problem (related to matlab/mex) was reported here:

https://stackoverflow.com/questions/47772319/getting-started-with-int8-arithmetic-using-dp4a

if you study the above, you’ll note that your matlab may have some “default” settings for arch. If any of these “default” settings for arch are inconsistent with your usage of double atomicAdd, then you will get the error, even though you have added the sm_61 arch switch. Therefore to get this to work as you have written it, if there are conflicting arch settings, you would need to remove those other conflicting settings (which I don’t think can be done purely with the NVCCFLAGS environment variable).

Another possible method to work around this is to do what is described in the programming guide:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions

i.e. provide a double atomicAdd implementation that will only be selected for arch settings that don’t provide a built-in one. Just copy-paste that device function (along with the #if compiler directives surrounding it) listed in the programming guide into the code you are compiling with mexcuda, before any functions (global or device) that use double atomicAdd

I believe this method combined with setting NVCCFLAGS should give you what you want.

Mexcuda would not compile when copy-pasting the atomicAdd example from the documentation. It complained about atomicAdd already being defined.

But modifying the xml files as was suggested on the stackoverflow thread actually worked!! Finally! Thank you for the link.