Hi,
I wrote a kernel, a basic edge detector, it looks like this:
void __global__ kernelEdgeb3(uchar4* out, int pitch)
{
int y = blockDim.y * blockIdx.y + threadIdx.y;
int x = blockDim.x * blockIdx.x + threadIdx.x;
int sumx, sumy;
uchar4 result;
uchar4 rgb1 = texfetch(texb4_a, x-1, y-1);
uchar4 rgb2 = texfetch(texb4_a, x, y-1);
uchar4 rgb3 = texfetch(texb4_a, x+1, y-1);
uchar4 rgb4 = texfetch(texb4_a, x-1, y);
uchar4 rgb6 = texfetch(texb4_a, x+1, y);
uchar4 rgb7 = texfetch(texb4_a, x-1, y+1);
uchar4 rgb8 = texfetch(texb4_a, x, y+1);
uchar4 rgb9 = texfetch(texb4_a, x+1, y+1);
int3 sumX;
int3 sumY;
sumX.x = (int)rgb1.x - rgb3.x + ((int)rgb4.x - rgb6.x)*2 + rgb7.x - rgb9.x;
sumX.y = (int)rgb1.y - rgb3.y + ((int)rgb4.y - rgb6.y)*2 + rgb7.y - rgb9.y;
sumX.z = (int)rgb1.z - rgb3.z + ((int)rgb4.z - rgb6.z)*2 + rgb7.z - rgb9.z;
sumY.x = (int)rgb1.x - rgb7.x + ((int)rgb2.x - rgb8.x)*2 + rgb3.x - rgb9.x;
sumY.y = (int)rgb1.y - rgb7.y + ((int)rgb2.y - rgb8.y)*2 + rgb3.y - rgb9.y;
sumY.z = (int)rgb1.z - rgb7.z + ((int)rgb2.z - rgb8.z)*2 + rgb3.z - rgb9.z;
sumx = min((abs(sumX.x) + abs(sumX.y) + abs(sumX.z)) / 3, 255);
sumy = abs(sumY.x) + abs(sumY.y);
/*---->*/ //sumy += abs(sumY.z);
sumy = min(sumy / 3, 255);
result = make_uchar4(sumx, 0, sumy, 255);
//return result
out = (uchar4*)( ((uchar1*)out) + pitch*y );
out[x] = result;
}
The input texture is of type texture<uchar4, 2, cudaReadModeElementType> and holds RGBA information. I’m just applying the sobel discrete filter kernel for all of the color channels and combine the results afterwards.
Most of the stuff you can just overread and get to the line i marked with an arrow. Uncommenting this line, makes my resulting image grey. That should be impossible, since make_uchar4(…) sets the green channel always to zero.
It seems, that the problem is accessing sumY.z ! Any other variable or constant works.
Further test gave me, that rgb2.z - rgb8.z causes the failure (grey result image). Outputting rgb2.z or rgb8.z alone works.
Also, the whole calculation works, if i comment out the calculation of any of the components of sumX. So this might be some kind of stack overflow?
What might cause this problem?
I’m working with MSVS 2005 on WinXP with a 8800GTX.
Thanks in advance!