Using __CUDA_ARCH__ to do __device__ only assertion

Working on a project in which we want to have simultaneous implementation on GPU and CPU. Only float for the GPU and float and double for the CPU. So I wanted to assert that some double template is not instantiated in device code:

template<typename T>
__host__ __device__ __forceinline__ T mul_(T a, T b) {
#ifdef __CUDA_ARCH__
    // expecting this to only trigger for __device__
    static_assert(std::is_same<T, float>::value, "float only");
#endif
    return a * b;
}

__host__ __device__ float mul(float a, float b) {
    return mul_(a,b);
}

__host__ double mul(double a, double b) {
    return mul_(a,b);
}

template<typename T>
__global__ void mulGpu(T*a, T*b) {
    *b = mul(*a, *b);
}

template<typename T>
void mulCpu(T*a, T*b) {
    *b = mul(*a, *b);
}

template __global__ void mulGpu<float>(float*, float*);
template void mulCpu<float>(float*, float*);
template void mulCpu<double>(double*, double*);

Even though I did not instance a mulGpu<double> the static_assert fails:

error: static assertion failed with "float only"
          detected during instantiation of "T mul_(T, T) [with T=double]" 
(15): here

1 error detected in the compilation of "<source>".
Compiler returned: 1

How come? Why did the #ifdef __CUDA_ARCH__ guard not work?

The only way to instantiate mul_ is as a __host__ __device__ function. That is inherent in your definition. That means you are instructing the compiler to create a host code version of it, and a device code version of it, any time you instantiate it.

__host__ __device__ is not an instruction to the compiler to instantiate a host callable version iff I happen to call this from host code, and instantiate a device callable version iff I happen to call it from device code.

FWIW you don’t need most of the code you have posted to demonstrate the error. The following is sufficient:

template<typename T>
__host__ __device__ __forceinline__ T mul_(T a, T b) {
#ifdef __CUDA_ARCH__
    // expecting this to only trigger for __device__
    static_assert(std::is_same<T, float>::value, "float only");
#endif
    return a * b;
}
__host__ double mul(double a, double b) {
    return mul_(a,b);
}