Access to CUDA library functions inside specialized instantiations of __device__ function templates

I have the following template device function:

template<typename T>
__device__ void MyatomicAdd(T *address, T val){
	atomicAdd(address , val);
}

that compiles just fine, but obviously there is no atomicAdd() for doubles, so I want to specialize MyatomicAdd() to allow a hand coded implementation in double precision. Ignoring the double precision specialization for now, the single precision specialization and template look like this:

template<typename T>
__device__ void MyatomicAdd(T *address, T val){
	};


template<>
__device__ void MyatomicAdd<float>(float *address, float val){
atomicAdd(address , val);

}

Now the compiler complains that atomicAdd() is undefined in my specialization. Any help? Thanks.

You are probably compiling for compute capability 1.x, where atomicAdd() cannot operate on float variables.

This is likely as compute capability 1.0 is the default. You also see the error if you compile for multiple compute capabilities and one of them is 1.x .

Sadly not, I’m compiling in compute capability 3.0. The first template compiles and runs just fine if the template is instantiated with T as a float. It’s only when I specialize the instantiation (like in the second example) that the compiler complains atomicAdd() is undefined.

I’m still not having any luck with this. I put the above code into a small test program and it does compile and run, so there’s an issue with the rest of my program somewhere. Just in case anyone has dealt with anything similar before, my cu file contains a template kernel and a couple of template device functions. These are then #include-ed in a header file containing the definitions of several c++ template functions (one of which calls the kernel in the cu file) belonging to a template class. Without any specialization of templates in the cu file the code compiles, and runs correctly.