There is such a kernel:
global void function(float* result, size_t width, size_t height)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
const float factor = 0.1;
if(i < width && j < height)
{
result[j * width + i] = factor * (tex2D(Tex1_1, i, j).x - tex2D(Tex1_1, i - 1, j).x);
result[j * width + i] += factor * (tex2D(Tex1_2, i, j).x - tex2D(Tex1_2, i, j - 1).x);
result[j * width + i] += factor * (tex2D(Tex1_3, i / 2., j / 2.).x);
}
}
Where, result is a vector of linear memory, Tex1_1 and Tex1_2 are texture references bound to CUDA arrays of the same dimensions and Tex1_3 is a texture with both dimensions of half size (scaled down). It seems that:
- results differ slightly between calls, up to a few percent; I found it in all kernels that use float textures
- const float factor (see above) from between [0.03,0.3] is required here to prevent resulting data from overdrive. Limits are close, but not equal to, [-1,1]. Such overdriven data have NaN values.
My GPU is Geforce 9800 GT, and the driver is 185.18.14.
Can anybody help me to find out what’s going on here?