Are half-precision fp vector textures broken?

the_ps · January 15, 2010, 9:39pm

I am successfully using half-precision (16-bit) floating point textures with only one channel.

With 2 or 4 channels, it will not compile if I try to save the texture lookup as a vector float. If I save as a vector short and convert, I get the wrong result.

[codebox]

texture<unsigned short, 2, cudaReadModeElementType> tex_unsigned_short;

texture<ushort1, 2, cudaReadModeElementType> tex_ushort1;

texture<ushort2, 2, cudaReadModeElementType> tex_ushort2;

texture<ushort4, 2, cudaReadModeElementType> tex_ushort4;

global

void

test()

{

 // works

float  f  = tex2D(tex_unsigned_short, 0, 0);

// also works

f  = __half2float(tex2D(tex_unsigned_short, 0, 0));

// error: no suitable conversion function from "ushort1" to "float" exists

f  = tex2D(tex_ushort1, 0, 0);

// error: no suitable user-defined conversion from "ushort2" to "float2" exists

float2 f2 = tex2D(tex_ushort2, 0, 0);

// error: no suitable user-defined conversion from "ushort4" to "float4" exists

float4 f4 = tex2D(tex_ushort4, 0, 0); 

// compiles, but wrong answer

ushort4 us4 = tex2D(tex_ushort4, 0, 0);

f = __half2float(us4.x);

}[/codebox]

eelsen · January 15, 2010, 10:25pm

I am successfully using half-precision (16-bit) floating point textures with only one channel.

With 2 or 4 channels, it will not compile if I try to save the texture lookup as a vector float. If I save as a vector short and convert, I get the wrong result.

[codebox]

texture<unsigned short, 2, cudaReadModeElementType> tex_unsigned_short;

texture<ushort1, 2, cudaReadModeElementType> tex_ushort1;

texture<ushort2, 2, cudaReadModeElementType> tex_ushort2;

texture<ushort4, 2, cudaReadModeElementType> tex_ushort4;

global

void

test()

{
 // works

float  f  = tex2D(tex_unsigned_short, 0, 0);

// also works

f  = __half2float(tex2D(tex_unsigned_short, 0, 0));

// error: no suitable conversion function from "ushort1" to "float" exists

f  = tex2D(tex_ushort1, 0, 0);

// error: no suitable user-defined conversion from "ushort2" to "float2" exists

float2 f2 = tex2D(tex_ushort2, 0, 0);

// error: no suitable user-defined conversion from "ushort4" to "float4" exists

float4 f4 = tex2D(tex_ushort4, 0, 0); 

// compiles, but wrong answer

ushort4 us4 = tex2D(tex_ushort4, 0, 0);

f = __half2float(us4.x);
}[/codebox]

I think half floats are only supported with the driver api. Is that what you’re using?

the_ps · January 15, 2010, 10:52pm

Yes, I am using the driver API. Apologies for not mentioning that.

Also 3.0 64-bit beta CUDA on 64-bit Vista.

ghotep · January 17, 2010, 1:40am

They work, but, since float16 is a storage format, not a usable format in a kernel, your textures should be float2 or float4, not ushort2/4. I’m using them all the time with the driver API. The upconversion from fp16 to fp32 by the texture access is free.

the_ps · January 17, 2010, 5:41am

Thanks ghotep. That makes perfect sense. I don’t know how that escaped me. :unsure: I will try it Monday.

the_ps · January 18, 2010, 5:53pm

Works now. External Media

gshi · January 19, 2010, 4:53pm

Could you post some sample code how to use it? thanks

the_ps · January 19, 2010, 6:50pm

Well, you write the device code exactly how you would write it for a full-precision texture. (texture<float, …> or texture <float4, …>)

In the host code,

cuTexRefSetFormat(texture_reference, CU_AD_FORMAT_HALF, 1) [for float return type]

…

cuTexRefSetFormat(texture_reference, CU_AD_FORMAT_HALF, 4) [for float4 return type]

Regular floating point code would use CU_AD_FORMAT_FLOAT.

There is some SDK sample code that shows how to bind textures with the driver API.

The other thing is to convert full-precision floating points to 16-bit floating points. On the device, there is __float2half_rn() and __half2float(). On the host, no conversion functions are provided. But there are definitions for the device functions in device_functions.h that you can quite easily modify to use on the host.