Hi all,
I’m running out of GPU ram for my algorithms and was thinking that half-floats offer more than enough precision for all I need.
I am using the driver API, so I know I can read half-float textures, but I need a way to write and also read half-floats to/from device linear memory. I took a quick look at the PTX manual and saw lots of intrinsicts for the half float conversions.
Why are there no CUDA functions for these?
I guess that through some smart bit shifting and the like the conversion can also be done “manually”.
But I’m not sure how.
I just learned more than I ever wanted of floating point representation…
Yes, looking at the complexity of handling the denormals, etc correctly I realize it’s not just a matter of shifting bits.
Knowing it’s all in there in PTX already does not really motivate to tackle this manually.
So I’ll cross my fingers this comes into CUDA 2.3 - thanks for promising it Simon :)
Thank you all for your kind help.
Mark
BTW: Just in case someone wondered why I’m getting out of RAM… I’m working on a Mac with the GT8800 that only has 512MB. The Quadro with 1.5GB is way too expensive for me. If I where a Windows user I would certainly have already solved that…
CUDA does not support denormals anyway, and if you code your program carefully you can also avoid infinites, NaNs, leaving only bit shifting for your code.
So if you don’t want to wait for CUDA 2.3 I would try that simplest shifting algorithm anyway and keep my fingers crossed :)
Alternatively, maybe there are other ways of decreasing your memory requirements?
Actually, the double-precision instructions do support subnormal inputs and results, while single-precision instructions flush subnormal inputs and results to zero.