I can’t find information on a very basic question.
assuming __half h1, h2
, can I write h1+h2
or should I use __hadd(h1, h2)
?
(same question for + - * /)
If direct mapping operator+ -> __hadd()
is not supported, what would be the reason ?
I can’t find information on a very basic question.
assuming __half h1, h2
, can I write h1+h2
or should I use __hadd(h1, h2)
?
(same question for + - * /)
If direct mapping operator+ -> __hadd()
is not supported, what would be the reason ?
The c++ language doesn’t have any formal definition of a 16-bit floating point type, AFAIK. To get usual access to this in CUDA C++ you must include cuda_fp16.h
Overloads are provided for this type of activity:
half a = 0;
half b = 1;
half c = a + b;
In newer versions of CUDA, if you target cc5.3 or higher, and for device code use only.
(C++ doesn’t define those operations natively)
I did not want to be too verbose in the initial question but this is precisely the point, because it seems to work, but I can’t find a documentation giving the limits :
Is it designed to be supported on host and/or device ? (your answer : device only)
Is it compute capability dependent ? (your answer : cuda 53)
Is it undefined behaviour that somehow work for now, or is it thrustworthy according to some NVidia doc ?
Is it really converted to __hadd() or is it unefficiently compiled as float2half(half2float(h1)+half2float(h2)) ?
Is it doomed and will be deprecrated, or with a bright future and wider support ? (in the first case : I should not rely on it and should as well specialize all my template kernels for the __half case)
(by “wider support” I mean equivalent operator+-/*
overload for half2 and bfloat16)
The overload is in cuda_fp16.hpp
so you can see exactly what is done. That is a header file provided by CUDA, so I would consider it trustworthy to use. I can’t predict the future; things do get deprecated from time to time. YMMV.
There appear to be a set of half2
overloads in cuda_fp16.hpp
. If I wanted to see what was possible with bf16 for example, I would look in the header file cuda_bf16.hpp
.
Okay, I did not pay attention that this C++ header existed, since I only included “cuda_fp16.h”. Actually, the #include "cuda_fp16.hpp"
is done at the end of cuda_fp16.h
and was somewhat stealthy.
The C++ overload of cast and artihmetic operators now perfectly makes sense and explains why it works !
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.