__half and standard operators + * / -

I can’t find information on a very basic question.
assuming __half h1, h2, can I write h1+h2 or should I use __hadd(h1, h2) ?
(same question for + - * /)

If direct mapping operator+ -> __hadd() is not supported, what would be the reason ?

The c++ language doesn’t have any formal definition of a 16-bit floating point type, AFAIK. To get usual access to this in CUDA C++ you must include cuda_fp16.h

Overloads are provided for this type of activity:

half a = 0;
half b = 1;
half c = a + b;

In newer versions of CUDA, if you target cc5.3 or higher, and for device code use only.

(C++ doesn’t define those operations natively)

1 Like

I did not want to be too verbose in the initial question but this is precisely the point, because it seems to work, but I can’t find a documentation giving the limits :

  • Is it designed to be supported on host and/or device ? (your answer : device only)

  • Is it compute capability dependent ? (your answer : cuda 53)

  • Is it undefined behaviour that somehow work for now, or is it thrustworthy according to some NVidia doc ?

  • Is it really converted to __hadd() or is it unefficiently compiled as float2half(half2float(h1)+half2float(h2)) ?

  • Is it doomed and will be deprecrated, or with a bright future and wider support ? (in the first case : I should not rely on it and should as well specialize all my template kernels for the __half case)

(by “wider support” I mean equivalent operator+-/* overload for half2 and bfloat16)

The overload is in cuda_fp16.hpp so you can see exactly what is done. That is a header file provided by CUDA, so I would consider it trustworthy to use. I can’t predict the future; things do get deprecated from time to time. YMMV.

There appear to be a set of half2 overloads in cuda_fp16.hpp. If I wanted to see what was possible with bf16 for example, I would look in the header file cuda_bf16.hpp.

Okay, I did not pay attention that this C++ header existed, since I only included “cuda_fp16.h”. Actually, the #include "cuda_fp16.hpp" is done at the end of cuda_fp16.h and was somewhat stealthy.

The C++ overload of cast and artihmetic operators now perfectly makes sense and explains why it works !

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.