Are half-precision fp vector textures broken?

I am successfully using half-precision (16-bit) floating point textures with only one channel.

With 2 or 4 channels, it will not compile if I try to save the texture lookup as a vector float. If I save as a vector short and convert, I get the wrong result.

[codebox]

texture<unsigned short, 2, cudaReadModeElementType> tex_unsigned_short;

texture<ushort1, 2, cudaReadModeElementType> tex_ushort1;

texture<ushort2, 2, cudaReadModeElementType> tex_ushort2;

texture<ushort4, 2, cudaReadModeElementType> tex_ushort4;

global

void

test()

{

 // works

float  f  = tex2D(tex_unsigned_short, 0, 0);

// also works

f  = __half2float(tex2D(tex_unsigned_short, 0, 0));

// error: no suitable conversion function from "ushort1" to "float" exists

f  = tex2D(tex_ushort1, 0, 0);

// error: no suitable user-defined conversion from "ushort2" to "float2" exists

float2 f2 = tex2D(tex_ushort2, 0, 0);

// error: no suitable user-defined conversion from "ushort4" to "float4" exists

float4 f4 = tex2D(tex_ushort4, 0, 0); 

// compiles, but wrong answer

ushort4 us4 = tex2D(tex_ushort4, 0, 0);

f = __half2float(us4.x);

}[/codebox]

I think half floats are only supported with the driver api. Is that what you’re using?

Yes, I am using the driver API. Apologies for not mentioning that.

Also 3.0 64-bit beta CUDA on 64-bit Vista.

They work, but, since float16 is a storage format, not a usable format in a kernel, your textures should be float2 or float4, not ushort2/4. I’m using them all the time with the driver API. The upconversion from fp16 to fp32 by the texture access is free.

Thanks ghotep. That makes perfect sense. I don’t know how that escaped me. :unsure: I will try it Monday.

Works now. External Media

Could you post some sample code how to use it? thanks

Well, you write the device code exactly how you would write it for a full-precision texture. (texture<float, …> or texture <float4, …>)

In the host code,

cuTexRefSetFormat(texture_reference, CU_AD_FORMAT_HALF, 1) [for float return type]

cuTexRefSetFormat(texture_reference, CU_AD_FORMAT_HALF, 4) [for float4 return type]

Regular floating point code would use CU_AD_FORMAT_FLOAT.

There is some SDK sample code that shows how to bind textures with the driver API.

The other thing is to convert full-precision floating points to 16-bit floating points. On the device, there is __float2half_rn() and __half2float(). On the host, no conversion functions are provided. But there are definitions for the device functions in device_functions.h that you can quite easily modify to use on the host.