half/Half2 constants

Hey, quick question on half/half2 constants.

I was trying to make a constant out of a half2 variable and I seem to be doing something wrong.

__constant__ __half test(.5f);
__constant__ __half2 test2(__half(.2f), __half(.2f));
__constant__ __half2 test3 = __float2half2_rn(0.2f);

I’m getting an error like:

 error: dynamic initialization is not supported for a __constant__ variable
  __attribute__((constant)) __half test(.5f);

I couldn’t figure out a way around this so I went back to using #defines.

I saw this post here but it doesn’t look like it was answered

Thanks again for the help!

One possible approach would be to initialize the constant in a separate host half/half2 variable, in host code, then use cudaMemcpyToSymbol to copy the initialized value to the __constant__ variable.

Depending on exact usage, #define may have better performance than usage of __constant__. (And there are possibly cases where __constant__ is better.) IMO #define communicates the largest amount of information to the compiler. That is generally a good thing.

define and constant are two totally different ways:

define exchanges the usage within the source code, it is converted to something like a literal or immediate value and typically compiled into the SASS instructions.

constant uses constant memory on the GPU. The instructions load the value from there. Constant memory has a separate small cache and a separate data path.

As Robert said, normally define is better, sometimes you can also use constexpr (with a similar effect as define).

This is a float constant. Assigning it to __half requires conversion, which is what I think is the source of the error message.

I am not sure where the C++ standard is at the moment with respect to half precision. The fact that we still use __half in CUDA (which is a C++ dialect) is an indication that half precision is not supported by the C++ standard yet, as symbols with two leading underscores indicate an implementation-defined feature.

My vague recollection is that the discussion of the C++ folks with regard to half precision included a proposal to use the sf suffix (“short float”) for half-precision literal constants and standard math functions, e.g. 0.5sf, expsf().

You would want to make sure that any such #define evaluates to __half type (or only use such #defines to initialize a __half variable), otherwise you might inadvertently convert half-precision computation into single-precision computation, analogous to how use of a double floating-point literal can turn an intended float computation into double computation.

[Later:]

Here is the proposal made to the ISO C++ working group for half-precision support in 2018 (by NVIDIA representatives, as I notice belatedly :-)

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0192r4.html

More recently in 2022, this appears to have been extended into a generalized extended floating-point type system for C++

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html

From what I can see from the public meeting notes of the ISO C++ group, this topic is still under active discussion / clarification.

I was unable to locate how to create half compile-time constants in any “trivial” way. Even the constants provided in the math API are not compile-time constants.

FWIW, borrowing from here, this seems to work:

# cat t324.cu
#include <cuda_fp16.h>
#include <cstdio>
#include <bit>


static constexpr unsigned short my_internal_float2half(const float f, unsigned int& sign, unsigned int& remainder) {
    unsigned int x;
    unsigned int u;
    unsigned int result;
    x = std::bit_cast<int>(f); //c++20
    ////(void)std::memcpy(&x, &f, sizeof(f)); //not constexpr
    u = (x & 0x7fffffffU);
    sign = ((x >> 16U) & 0x8000U);
    // NaN/+Inf/-Inf
    if (u >= 0x7f800000U) {
        remainder = 0U;
        result = ((u == 0x7f800000U) ? (sign | 0x7c00U) : 0x7fffU);
    } else if (u > 0x477fefffU) { // Overflows
        remainder = 0x80000000U;
        result = (sign | 0x7bffU);
    } else if (u >= 0x38800000U) { // Normal numbers
        remainder = u << 19U;
        u -= 0x38000000U;
        result = (sign | (u >> 13U));
    } else if (u < 0x33000001U) { // +0/-0
        remainder = u;
        result = sign;
    } else { // Denormal numbers
        const unsigned int exponent = u >> 23U;
        const unsigned int shift = 0x7eU - exponent;
        unsigned int mantissa = (u & 0x7fffffU);
        mantissa |= 0x800000U;
        remainder = mantissa << (32U - shift);
        result = (sign | (mantissa >> shift));
        result &= 0x0000FFFFU;
    }
    return static_cast<unsigned short>(result);
}

static constexpr __half my_float2half_rn(const float a) {
    __half val;
    __half_raw r;
    unsigned int sign = 0U;
    unsigned int remainder = 0U;
    r.x = my_internal_float2half(a, sign, remainder);
    if ((remainder > 0x80000000U) || ((remainder == 0x80000000U) && ((r.x & 0x1U) != 0U))) {
        r.x++;
    }
    val = std::bit_cast<__half>(r); //allowed, see operator= for __raw_half -> __half
    return val;
}

constexpr __half h1 = my_float2half_rn(1.0f);
__constant__ half x = h1;


__global__ void k(){
  printf("%f\n", __half2float(x));
}

int main(){

  k<<<1,1>>>();
  cudaDeviceSynchronize();
}
# nvcc -arch=sm_89 t324.cu -o t324 -std=c++20
# compute-sanitizer ./t324
========= COMPUTE-SANITIZER
1.000000
========= ERROR SUMMARY: 0 errors
#

CUDA 12.2

I have not tested it other than what you see above.

The look & feel of half precision support in CUDA is one of a temporary stopgap measure with rough edges all around, of which the inability to create half-precision literal constants at compile time is one aspect. Maybe NVIDIA thought they could get proper support into the C++ standard faster.

If I had to guess, once NVIDIA brought their original proposal to the working group, the gcc folks said “What about __float128?”, and the Google folks said “What about bfloat16?”, and a longer discussion regarding a general revamping of C++'s floating-point type system ensued.

Recent straw polls taken by the committee suggest that the new extended floating-point type system is gelling so that there is a decent chance it may be rolled out with the next revision of the standard, which is tentatively scheduled for 2026.

There is
https://en.cppreference.com/w/cpp/header/stdfloat
https://en.cppreference.com/w/cpp/types/floating-point

which mention a suffix for literals called f16 since C++23.
(The headers even include a BF16 type.)

Cuda (nvcc) is at C++20, but it could work for pure host code.

1 Like

Interesting! I could have sworn the meeting notes with the straw polls on the extended floating-point type proposal were dated 2024.

Support seems to be spotty at this time. At Compiler Explorer, the following compiles fine with gcc 14.2 using -std=c++23 -O3 -Wall, but not with the latest clang (or Intel compiler, for that matter): no type named 'float16_t' in namespace 'std'

#include <cstdio>
#include <stdfloat>
const volatile std::float16_t two = 2.0f16;
int main (void)
{
    std::float16_t four = two * two;
    printf ("four=%11.4e\n", (double) four);
    return 0;
}

The generated code looks as expected, as does the data deposited for two, and when I enable execution in Compiler Explorer it prints the expected result. With -march=graniterapids I even got native FP16 instructions:

.LC0:
        .string "four=%11.4e\n"
main:
        sub     rsp, 8
        vmovw   xmm0, WORD PTR two[rip]
        vmovw   xmm1, WORD PTR two[rip]
        mov     edi, OFFSET FLAT:.LC0
        vmulsh  xmm0, xmm0, xmm1
        mov     eax, 1
        vcvtsh2sd       xmm0, xmm0, xmm0
        call    printf
        xor     eax, eax
        add     rsp, 8
        ret
two:
        .value  16384
1 Like

Wow, thanks for all the answers.
I ended up going with a define like this:

#define DECAY __float2half2_rn(.2f)

Since as mentioned above, it sounds like this gives the compiler more options to optimize.

Thanks again!