Tested the time to convert a 32 bit float original 1248x960x496 matrix (just under 2.4 GB) into a 16 bit float matrix of the same size, and then convert back to see the loss of accuracy.
Wrote my own naive kernels to perform the conversions both directions and timed and ran from MATLAB via a mex file;
Overall the times seems very good, but it takes slightly longer to convert from 16 bit float to 32 bit float;
time to convert 1248x960x496 from 32 bit float to 16 bit float =0.013000 time to convert back 1248x960x496 from 16 bit float to 32 bit float =0.014000 mean absolute error = 0.0001647467 max absolute error = 0.015625
time is expressed in terms of seconds, and it takes 10-13 ms for 32-16 conversion and 13-17 ms for 16-32 conversion.
Using CUDA 7.5 with latest driver, Windows 7, Titan X using TCC driver
So pretty good results in both performance and in error after two conversions.
If anyone is interested here are the histograms of the values, with the first being the histogram of the original buffer in 32 bit form, and the second histogram being the result of the conversion done in CUDA from 32 bit to 16 bit and back to 32 bit;
Nice new feature of CUDA 7.5. Will put this to good use!