One possible approach would be to initialize the constant in a separate host half/half2 variable, in host code, then use cudaMemcpyToSymbol to copy the initialized value to the __constant__ variable.
Depending on exact usage, #define may have better performance than usage of __constant__. (And there are possibly cases where __constant__ is better.) IMO #define communicates the largest amount of information to the compiler. That is generally a good thing.
define and constant are two totally different ways:
define exchanges the usage within the source code, it is converted to something like a literal or immediate value and typically compiled into the SASS instructions.
constant uses constant memory on the GPU. The instructions load the value from there. Constant memory has a separate small cache and a separate data path.
As Robert said, normally define is better, sometimes you can also use constexpr (with a similar effect as define).
This is a float constant. Assigning it to __half requires conversion, which is what I think is the source of the error message.
I am not sure where the C++ standard is at the moment with respect to half precision. The fact that we still use __half in CUDA (which is a C++ dialect) is an indication that half precision is not supported by the C++ standard yet, as symbols with two leading underscores indicate an implementation-defined feature.
My vague recollection is that the discussion of the C++ folks with regard to half precision included a proposal to use the sf suffix (“short float”) for half-precision literal constants and standard math functions, e.g. 0.5sf, expsf().
You would want to make sure that any such #define evaluates to __half type (or only use such #defines to initialize a __half variable), otherwise you might inadvertently convert half-precision computation into single-precision computation, analogous to how use of a double floating-point literal can turn an intended float computation into double computation.
[Later:]
Here is the proposal made to the ISO C++ working group for half-precision support in 2018 (by NVIDIA representatives, as I notice belatedly :-)
I was unable to locate how to create half compile-time constants in any “trivial” way. Even the constants provided in the math API are not compile-time constants.
The look & feel of half precision support in CUDA is one of a temporary stopgap measure with rough edges all around, of which the inability to create half-precision literal constants at compile time is one aspect. Maybe NVIDIA thought they could get proper support into the C++ standard faster.
If I had to guess, once NVIDIA brought their original proposal to the working group, the gcc folks said “What about __float128?”, and the Google folks said “What about bfloat16?”, and a longer discussion regarding a general revamping of C++'s floating-point type system ensued.
Recent straw polls taken by the committee suggest that the new extended floating-point type system is gelling so that there is a decent chance it may be rolled out with the next revision of the standard, which is tentatively scheduled for 2026.
Interesting! I could have sworn the meeting notes with the straw polls on the extended floating-point type proposal were dated 2024.
Support seems to be spotty at this time. At Compiler Explorer, the following compiles fine with gcc 14.2 using -std=c++23 -O3 -Wall, but not with the latest clang (or Intel compiler, for that matter): no type named 'float16_t' in namespace 'std'
#include <cstdio>
#include <stdfloat>
const volatile std::float16_t two = 2.0f16;
int main (void)
{
std::float16_t four = two * two;
printf ("four=%11.4e\n", (double) four);
return 0;
}
The generated code looks as expected, as does the data deposited for two, and when I enable execution in Compiler Explorer it prints the expected result. With -march=graniterapids I even got native FP16 instructions:
.LC0:
.string "four=%11.4e\n"
main:
sub rsp, 8
vmovw xmm0, WORD PTR two[rip]
vmovw xmm1, WORD PTR two[rip]
mov edi, OFFSET FLAT:.LC0
vmulsh xmm0, xmm0, xmm1
mov eax, 1
vcvtsh2sd xmm0, xmm0, xmm0
call printf
xor eax, eax
add rsp, 8
ret
two:
.value 16384