this is my scaleDown kernel. It is supposed to reduce an image to width/2 and height/2 using linear interpolation. The image date is stored in the x,y, z components of a float4 cuda array.
/*!
* reduces the size of an image a x b to a/2 x b/2 using 9bit precision floatingpoint interpolation
* kernel expects a texture reference tex_image to be set to the input image
* width - width of resulting image
* height - height of resulting image
* pitch - pitch of 2D global memory array from cudaMalloc2D
*/
__global__ void ScaleDownKernel(unsigned int width,unsigned int height,float4* data, unsigned int pitch) {
int px = threadIdx.x + blockDim.x * blockIdx.x;
int py = threadIdx.y + blockDim.y * blockIdx.y;
if(px < width && py < height) { // check if px,py within boundaries
float4* row = (float4*)((char*)data + py * pitch);
row[px] = tex2D(tex_image,2.0f*(float)px + 1.0f , 2.0f*(float)py + 1.0f);
//row[px] = tex2D(tex_image,2.0f*(float)px + 0.5f , 2.0f*(float)py + 0.5f);
}
}
The problem is, only with +1.0f do i get the correct interpolation. I would expect this to happen with 0.5f added to the coordinates. +1.0 does not step in between pixels. It seems like coordinates are shifted by 0.5 in cudaFilterModelLinear.
I have tried this out with cudaFilterModePoint and was able to access the correct coordinates.
We have some code that uses interpolation in both CUDA and OpenGL and to get the same results from both we have to bias the OpenGL coordinates by .5, .5.
It might be that CUDA texture coordinates purposefully point to the middle of the pixel instead of the corner. You say you get correct interpolation if you add 1, what happens if you add nothing?
We’ll see about clearing this up in the documentation. What follows is a description of the CUDA texture behavior.
The “top left” corner of the texture has coordinates (0,0). So, each texels’ “corners” have integer coordinates, whereas the middle of the texel will have a .5 added to the top left texel corner coordinate. Therefore, by providing, say, (0.5f, 0.5f) as coordinates, one is sampling dead in the middel of a texel and the returned value will have no contributions from surrounding texels. By providing (1.0f, 1.0f), one is sampling an “intersection” between 4 texels and will get a value that’s a mean of the corresponding texels.
Yes, this would be nice. Especially for “generel-purpose” programmers like me, who do not know much about graphic programming. And general-purposes computation is what cuda is all about, right :-)