Are half-precision fp vector textures broken?

I am successfully using half-precision (16-bit) floating point textures with only one channel.

With 2 or 4 channels, it will not compile if I try to save the texture lookup as a vector float. If I save as a vector short and convert, I get the wrong result.


texture<unsigned short, 2, cudaReadModeElementType> tex_unsigned_short;

texture<ushort1, 2, cudaReadModeElementType> tex_ushort1;

texture<ushort2, 2, cudaReadModeElementType> tex_ushort2;

texture<ushort4, 2, cudaReadModeElementType> tex_ushort4;





 // works

float  f  = tex2D(tex_unsigned_short, 0, 0);

// also works

f  = __half2float(tex2D(tex_unsigned_short, 0, 0));

// error: no suitable conversion function from "ushort1" to "float" exists

f  = tex2D(tex_ushort1, 0, 0);

// error: no suitable user-defined conversion from "ushort2" to "float2" exists

float2 f2 = tex2D(tex_ushort2, 0, 0);

// error: no suitable user-defined conversion from "ushort4" to "float4" exists

float4 f4 = tex2D(tex_ushort4, 0, 0); 

// compiles, but wrong answer

ushort4 us4 = tex2D(tex_ushort4, 0, 0);

f = __half2float(us4.x);


I think half floats are only supported with the driver api. Is that what you’re using?

Yes, I am using the driver API. Apologies for not mentioning that.

Also 3.0 64-bit beta CUDA on 64-bit Vista.

They work, but, since float16 is a storage format, not a usable format in a kernel, your textures should be float2 or float4, not ushort2/4. I’m using them all the time with the driver API. The upconversion from fp16 to fp32 by the texture access is free.

Thanks ghotep. That makes perfect sense. I don’t know how that escaped me. :unsure: I will try it Monday.

Works now. :geek:

Could you post some sample code how to use it? thanks

Well, you write the device code exactly how you would write it for a full-precision texture. (texture<float, …> or texture <float4, …>)

In the host code,

cuTexRefSetFormat(texture_reference, CU_AD_FORMAT_HALF, 1) [for float return type]

cuTexRefSetFormat(texture_reference, CU_AD_FORMAT_HALF, 4) [for float4 return type]

Regular floating point code would use CU_AD_FORMAT_FLOAT.

There is some SDK sample code that shows how to bind textures with the driver API.

The other thing is to convert full-precision floating points to 16-bit floating points. On the device, there is __float2half_rn() and __half2float(). On the host, no conversion functions are provided. But there are definitions for the device functions in device_functions.h that you can quite easily modify to use on the host.