Working on a project in which we want to have simultaneous implementation on GPU and CPU. Only float for the GPU and float and double for the CPU. So I wanted to assert that some double template is not instantiated in device code:
template<typename T>
__host__ __device__ __forceinline__ T mul_(T a, T b) {
#ifdef __CUDA_ARCH__
// expecting this to only trigger for __device__
static_assert(std::is_same<T, float>::value, "float only");
#endif
return a * b;
}
__host__ __device__ float mul(float a, float b) {
return mul_(a,b);
}
__host__ double mul(double a, double b) {
return mul_(a,b);
}
template<typename T>
__global__ void mulGpu(T*a, T*b) {
*b = mul(*a, *b);
}
template<typename T>
void mulCpu(T*a, T*b) {
*b = mul(*a, *b);
}
template __global__ void mulGpu<float>(float*, float*);
template void mulCpu<float>(float*, float*);
template void mulCpu<double>(double*, double*);
Even though I did not instance a mulGpu<double>
the static_assert
fails:
error: static assertion failed with "float only"
detected during instantiation of "T mul_(T, T) [with T=double]"
(15): here
1 error detected in the compilation of "<source>".
Compiler returned: 1
How come? Why did the #ifdef __CUDA_ARCH__
guard not work?