Hi all,
Usually when I write C++ template functions on the the host side I can do specializations of certain functions without any hitches.
On the device side however I seem to be running into some issues:
template<class T>
__device__ T warp_reduce_registers(T myVal)
{
// does something neat for most datatypes using _shfl instruction
}
But for doubles I want to do some specializations:
template<>
__device__ double warp_reduce_registers(double myVal)
{
// Do something neat for doubles specifically, without __shfl
}
template<class T>
__device__ T warp_reduce_registers(T myVal)
{
// does something neat for most datatypes using __shfl() instruction
}
Unfortunately the compiler still goes for the non-specialized implementation and complains about my use of the __shfl(…) instruction on double datatypes.
error : more than one instance of overloaded function "__shfl" matches the argument list:
1> function "__shfl(int, int, int)"
1> function "__shfl(float, int, int)"
1> argument types are: (double, int)
1> detected during:
1> instantiation of "T warp_reduce_registers(T) [with T=double]"
1> (185): here
Maybe I’m making some obvious mistake and you can give me some hints? Would very much appreciated it! :-)
Thanks,
Jim