Function template specialization in device code

Hi all,

Usually when I write C++ template functions on the the host side I can do specializations of certain functions without any hitches.

On the device side however I seem to be running into some issues:

template<class T>
__device__ T warp_reduce_registers(T myVal)
{
// does something neat for most datatypes using _shfl instruction
}

But for doubles I want to do some specializations:

template<>
__device__ double warp_reduce_registers(double myVal)
{
// Do something neat for doubles specifically, without __shfl
}

template<class T>
__device__ T warp_reduce_registers(T myVal)
{
// does something neat for most datatypes using __shfl() instruction
}

Unfortunately the compiler still goes for the non-specialized implementation and complains about my use of the __shfl(…) instruction on double datatypes.

error : more than one instance of overloaded function "__shfl" matches the argument list:
1>              function "__shfl(int, int, int)"
1>              function "__shfl(float, int, int)"
1>              argument types are: (double, int)
1>            detected during:
1>              instantiation of "T warp_reduce_registers(T) [with T=double]" 
1>  (185): here

Maybe I’m making some obvious mistake and you can give me some hints? Would very much appreciated it! :-)

Thanks,
Jim

Ok, fixed it myself.

The definitions need to be reordered in the file:

template<class T>
__device__ T warp_reduce_registers(T myVal)
{
// handle most types
}

template<>
__device__ double warp_reduce_registers<double>(double myVal)
{
// specialization for double
}