I’m trying to sum a vector down to one element using multiple threads which results in a race condition. To avoid it I’m trying to use atomicAdd during the accumulation. However, I can’t get it to work using double precision numbers.
When I compile my code (from Matlab) I get an error:
No instance of overloaded function "atomicAdd" matches the argument list argument types are: (double *, double)
I’m using cuda 8 with a GTX 1080ti so as I understand it atomicAdd should be supported with double precision.
This is the way Im trying to run it in my kernel:
__global__ void Accumulate(double* deviceSSE, int m) {
int index = threadIdx.x;
int stride = blockDim.x;
for (int rows = index; rows < m; rows += stride) {
atomicAdd(deviceSSE, 1.25);
}
}
Could someone please explain what I need to do in order to get it to work? According to the documentation it looks to me like Im giving it the correct input. One double pointer and one double value:
try this instead of NVCCFLAGS maybe? and possibly give this in front of the .cu file names to compile
–gpu-architecture=compute_61 --gpu-code=sm_61
Alternatively, if NVCCFLAGS is supposed to be given as an environment variable, then try defining its contents to be -gencode=arch=compute_61,code=sm_61
error. I suspect the flags needs to be after NVCCFLAGS in order for matlabs mexcuda not to reject it.
It tries to compile if I use the commmand you mentioned:
setenv(‘NVCCFLAGS’, ‘-gencode=arch=compute_61,code=sm_61’);
But again I get the same error about atomicAdd as before.
if you study the above, you’ll note that your matlab may have some “default” settings for arch. If any of these “default” settings for arch are inconsistent with your usage of double atomicAdd, then you will get the error, even though you have added the sm_61 arch switch. Therefore to get this to work as you have written it, if there are conflicting arch settings, you would need to remove those other conflicting settings (which I don’t think can be done purely with the NVCCFLAGS environment variable).
Another possible method to work around this is to do what is described in the programming guide:
i.e. provide a double atomicAdd implementation that will only be selected for arch settings that don’t provide a built-in one. Just copy-paste that device function (along with the #if compiler directives surrounding it) listed in the programming guide into the code you are compiling with mexcuda, before any functions (global or device) that use double atomicAdd
I believe this method combined with setting NVCCFLAGS should give you what you want.