I’m running out of GPU ram for my algorithms and was thinking that half-floats offer more than enough precision for all I need.
I am using the driver API, so I know I can read half-float textures, but I need a way to write and also read half-floats to/from device linear memory. I took a quick look at the PTX manual and saw lots of intrinsicts for the half float conversions.
Why are there no CUDA functions for these?
I guess that through some smart bit shifting and the like the conversion can also be done “manually”.
But I’m not sure how.
Can someone help?