floating-point texture fetching

Has anyone tried to use the texture hardware for doing floating-point computations? I did linear interpolation using it, but got some weird results. My code is like:

global void
transformKernel( float* g_odata, int width, int height)
{
// calculate normalized texture coordinates
unsigned int x = blockIdx.xblockDim.x + threadIdx.x;
unsigned int y = blockIdx.y
blockDim.y + threadIdx.y;

float u = (x + 0.2 + 0.5f) / 4.0;
float v = (y + 0.5f) / 4.0;

// transform coordinates
float tu = u;
float tv = v;

// read from texture and write to global memory
g_odata[y*width + x] = tex2D(tex, tu, tv);

}

When the original data are: 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000
g_odata should be: 1.2 1.8 1.2 2.0 1.2 1.8 1.2 2.0 1.2 1.8 1.2 2.0 1.2 1.8 1.2 2.0

Instead the data generated by GPU are: 1.199219 1.800781 1.199219 2.000000 1.199219 1.800781 1.199219 2.000000 1.199219 1.800781 1.199219 2.000000 1.199219 1.800781 1.199219 2.000000

Has anyone seen anything similar? Also, I did not get much speedup from using hardware interpolation, which I was expecting…

Keep in mind that texture interpolation weights are not 32-bit, the programming guide specifies how many bits are used. That probably explains your results, since it seems you’re using weights that are multiples of 0.1, which is not representable in any IEEE-like form.

What speedup did you expect and what did you get? I would expect that texture filtering should have been faster than doing interpolation inside the kernel.

Paulius