Load and store half-floats from device memory. How to shift those bits correctly...

maolimu · May 27, 2009, 3:44pm

Hi all,
I’m running out of GPU ram for my algorithms and was thinking that half-floats offer more than enough precision for all I need.

I am using the driver API, so I know I can read half-float textures, but I need a way to write and also read half-floats to/from device linear memory. I took a quick look at the PTX manual and saw lots of intrinsicts for the half float conversions.

Why are there no CUDA functions for these?

I guess that through some smart bit shifting and the like the conversion can also be done “manually”.
But I’m not sure how.

Can someone help?

Thanks
Mark

Cygnus_X1 · May 28, 2009, 11:00am

For manual conversions I would read the format description, e.g. over here: [url=“Half-precision floating-point format - Wikipedia”]http://en.wikipedia.org/wiki/Half_precision[/url].

Since half-float description above is short and authors asssumed you know how floats looks like I would suggest checking out
[url=“Floating-point arithmetic - Wikipedia”]http://en.wikipedia.org/wiki/Floating_point[/url]
[url=“http://babbage.cs.qc.cuny.edu/IEEE-754/References.xhtml”]http://babbage.cs.qc.cuny.edu/IEEE-754/References.xhtml[/url]
to undestand how full floats are represented.

Since device uses similar formats as on the host, I would first try to implement succesfull conversions on the host, before playing with CUDA.

From what I know CUDA differs in:

no support for denormalised values (very small, close to 0 values)
no or different support for incorrect values (NaNs)
not sure how ± infinity is handled.

cbuchner1 · May 28, 2009, 11:05am

For fast float <-> half float conversions on the CPU, check out this article

[url=“http://www.fox-toolkit.org/ftp/fasthalffloatconversion.pdf”]http://www.fox-toolkit.org/ftp/fasthalffloatconversion.pdf[/url]

Christian

Simon_Green · May 28, 2009, 11:42am

No promises, but half-float to float conversion intrinsics are planned for CUDA 2.3

maolimu · May 28, 2009, 12:15pm

I just learned more than I ever wanted of floating point representation…

Yes, looking at the complexity of handling the denormals, etc correctly I realize it’s not just a matter of shifting bits.
Knowing it’s all in there in PTX already does not really motivate to tackle this manually.

So I’ll cross my fingers this comes into CUDA 2.3 - thanks for promising it Simon :)

Thank you all for your kind help.
Mark

BTW: Just in case someone wondered why I’m getting out of RAM… I’m working on a Mac with the GT8800 that only has 512MB. The Quadro with 1.5GB is way too expensive for me. If I where a Windows user I would certainly have already solved that…

Cygnus_X1 · May 28, 2009, 12:53pm

CUDA does not support denormals anyway, and if you code your program carefully you can also avoid infinites, NaNs, leaving only bit shifting for your code.
So if you don’t want to wait for CUDA 2.3 I would try that simplest shifting algorithm anyway and keep my fingers crossed :)

Alternatively, maybe there are other ways of decreasing your memory requirements?

maolimu · May 28, 2009, 1:39pm

I was so lost into those floating point docs that I oversaw that info from your first post.

Leaving only the bit shifts seems like it’s worth trying it out.

Many thanks for pointing that out to me again!

(But I am still crossing fingers…)

Nico · May 28, 2009, 3:46pm

Actually, the double-precision instructions do support subnormal inputs and results, while single-precision instructions flush subnormal inputs and results to zero.

(PTX ISA 1.4 manual p.55)

N.

Cygnus_X1 · May 28, 2009, 5:13pm

Ah… I never worked with doubles, sorry.

Topic		Replies	Views
eficient conversion from and to HALF CUDA Programming and Performance	1	1518	October 24, 2008
error when trying to use half (fp16) CUDA Programming and Performance	16	20241	October 13, 2015
How to convert floats into halfs on NVidia's implementation? CUDA Programming and Performance	1	10250	July 15, 2010
Write support for cudaReadModeNormalizedFloat ? CUDA Programming and Performance	1	6572	November 4, 2008
fp16 <-> fp32 CUDA Programming and Performance	2	9109	July 23, 2009
'half' datatype - IEEE 754 conformance CUDA Programming and Performance	23	11003	March 10, 2017
how CUDA represents the floating number/. CUDA Programming and Performance	6	3129	June 26, 2008
16-bit floats available? CUDA Programming and Performance	4	2359	September 27, 2008
create a fp16(half) value directly CUDA Programming and Performance	2	1494	April 14, 2016
Double- to single-precision conversion on device using a compute device NOT supporting double type CUDA Programming and Performance	3	1389	May 4, 2009

Load and store half-floats from device memory. How to shift those bits correctly...

Related topics