Hello,
I am writing a CUDA program with mixed precision and several kernels use some basic constants such as __floats2half2_rn(0.0, 0.0) or __floats2half2_rn(PI, PI). How is the most efficient way to create them?
I tried with:
__device__ __constant__ half2 h2zeros = __floats2half2_rn(0.0, 0.0);
__device__ __constant__ half2 h2pis = __floats2half2_rn(PI, PI);
but it does not compile (error: dynamic initialization is not supported for a constant variable). Declaring them as static const in each kernel neither compile.
With #define it compiles:
#define h2zeros __floats2half2_rn(0.0,0.0)
#define h2pis __floats2half2_rn(PI,PI)
but I am worried about it efficiency. There is any way to do it more efficient?
Thanks in advance,
David