16 bit float operations

CudaaduC · April 6, 2015, 9:32pm

The only ‘support’ for the 16 bit float type I am aware of in the standard CUDA SDK is in the CUDA math API ‘Type Casting Intrinsic’ library.

[url]http://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__CAST.html#group__CUDA__MATH__INTRINSIC__CAST[/url]

There are a few functions which convert unsigned short values to 32 bit float and back.

Would like to be able to perform 16 bit floating multiply, addition and subtraction (or a FMA if possible), and not sure if there is already some existing CUDA functionality.

I think the texture objects have a built in interpolation ability, but I have not found any examples. Can anyone point me to some examples or documentation on this topic.

Already searched and this was the best thing I found so far;

[url]GPU Programming and Streaming Multiprocessors | 8.1. Memory | InformIT

but wonder if anyone can point to me a code example of half-precision operation in CUDA.

Robert_Crovella · April 6, 2015, 9:55pm

Some kind of FP16 support in Pascal was hinted at by NVIDIA CEO Jen-Hsun Huang during the GTC 2015 keynote. At the moment I don’t think you’ll find much exposed in CUDA that reflects FP16 support. Presumably that will appear in CUDA in time for Pascal support.

I don’t think you’ll find hardware level support (i.e. SASS) for any of the math operations you listed on FP16 in any compute capability up to 5.2.

Tegra X1 has considerable discussion around support for FP16 including FMA:

[url]http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf[/url]

AFAIK CUDA has not exposed significant support for this capability yet.

njuffa · April 7, 2015, 12:15am

I think you found all the FP16 support there currently is. Just like on other platforms (notably ARM) half precision is currently available only as a storage format, but not as a computational format. So the advantage of half precision compared to single precision is in increased storage density and bandwidth reduction. The computation itself needs to happen with float operands. Reading FP16 data from textures automatically expands the data to FP32, making this path particularly efficient. For normal loads, CUDA provides intrinsics for conversion between FP16 and FP32, as you already noted.

A worked example using FP16 textures can be found at [url]https://devtalk.nvidia.com/default/topic/547080/-half-datatype-ieee-754-conformance[/url]. I think you should be able to extend that example code to include interpolation, but I have not tried that myself.

Please note that FP16 arithmetic, if and when it will be supported in CUDA (presumably in the Pascal time frame, as txbob notes), will be accurate to only 3 decimal digits. This means there is only minimal tolerance to accumulated round-off error, assuming that practical applications will likely need final results accurate to at least 8 bits.

Topic		Replies	Views
Texture interpolation of packed fp16 (half2) CUDA Programming and Performance	2	707	June 20, 2022
16-bit floats available? CUDA Programming and Performance	4	2339	September 27, 2008
error when trying to use half (fp16) CUDA Programming and Performance	16	20065	October 13, 2015
What about half-float? CUDA Programming and Performance	18	29394	October 26, 2017
FP32 and FP16 activity during a pure 32bit float CUDA application is running CUDA Programming and Performance	4	1128	April 26, 2018
How FP32 and FP16 units are implemented in GP100 GPU's CUDA Programming and Performance	8	7547	March 28, 2017
Half precision cuFFT Transforms GPU-Accelerated Libraries	12	6091	March 29, 2021
16 bit int multiplication using SIMD / mixed precision CUDA Programming and Performance	7	1871	October 12, 2021
Using tex2D for unsigned short/char CUDA Programming and Performance	14	3700	November 15, 2017
Linear interpolation with integer texture. CUDA Programming and Performance	6	2752	August 12, 2022

16 bit float operations

Related topics