There is such a kernel:

**global** void function(float* result, size_t width, size_t height)

{

int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

const float factor = 0.1;

if(i < width && j < height)

{

result[j * width + i] = factor * (tex2D(Tex1_1, i, j).x - tex2D(Tex1_1, i - 1, j).x);

result[j * width + i] += factor * (tex2D(Tex1_2, i, j).x - tex2D(Tex1_2, i, j - 1).x);

result[j * width + i] += factor * (tex2D(Tex1_3, i / 2., j / 2.).x);

}

}

Where, result is a vector of linear memory, Tex1_1 and Tex1_2 are texture references bound to CUDA arrays of the same dimensions and Tex1_3 is a texture with both dimensions of half size (scaled down). It seems that:

- results differ slightly between calls, up to a few percent; I found it in all kernels that use float textures
- const float factor (see above) from between [0.03,0.3] is required here to prevent resulting data from overdrive. Limits are close, but not equal to, [-1,1]. Such overdriven data have NaN values.

My GPU is Geforce 9800 GT, and the driver is 185.18.14.

Can anybody help me to find out what’s going on here?