I am successfully using half-precision (16-bit) floating point textures with only one channel.
With 2 or 4 channels, it will not compile if I try to save the texture lookup as a vector float. If I save as a vector short and convert, I get the wrong result.
// works
float f = tex2D(tex_unsigned_short, 0, 0);
// also works
f = __half2float(tex2D(tex_unsigned_short, 0, 0));
// error: no suitable conversion function from "ushort1" to "float" exists
f = tex2D(tex_ushort1, 0, 0);
// error: no suitable user-defined conversion from "ushort2" to "float2" exists
float2 f2 = tex2D(tex_ushort2, 0, 0);
// error: no suitable user-defined conversion from "ushort4" to "float4" exists
float4 f4 = tex2D(tex_ushort4, 0, 0);
// compiles, but wrong answer
ushort4 us4 = tex2D(tex_ushort4, 0, 0);
f = __half2float(us4.x);
They work, but, since float16 is a storage format, not a usable format in a kernel, your textures should be float2 or float4, not ushort2/4. I’m using them all the time with the driver API. The upconversion from fp16 to fp32 by the texture access is free.
Regular floating point code would use CU_AD_FORMAT_FLOAT.
There is some SDK sample code that shows how to bind textures with the driver API.
The other thing is to convert full-precision floating points to 16-bit floating points. On the device, there is __float2half_rn() and __half2float(). On the host, no conversion functions are provided. But there are definitions for the device functions in device_functions.h that you can quite easily modify to use on the host.