Hello everybody, i have to develop a program wich process a greyscale image and i am far from being a killer in cuda

I’ve got a problem, let me explain :

For example an image with resolution 10x10 is just a one-dimensional array like that :

For my program, i have to process the green cells and i don’t touch the red cells.

This is how i do in C++ :

```
for(int i=1;i<(height-1);i++)
{
for(int j=1;j<(width-1);j++)
{
Gx1 = Gy1 = datanvg[((i-1)*width)+(j-1)];
Gx2 = Gy2 = datanvg[((i-1)*width)+j];
Gx3 = Gy3 = datanvg[((i-1)*width)+(j+1)]; // In this part i use the openCV library :
Gx4 = Gy4 = datanvg[(i*width)+(j-1)]; // I get the value of pixels located around the pixel that I process
Gx5 = Gy5 = datanvg[(i*width)+j]; // and then I perform the process.
Gx6 = Gy6 = datanvg[(i*width)+(j+1)];
Gx7 = Gy7 = datanvg[((i+1)*width)+(j-1)];
Gx8 = Gy8 = datanvg[((i+1)*width)+j];
Gx9 = Gy9 = datanvg[((i+1)*width)+(j+1)];
Gx = (Gx1*matGx1)+(Gx2*matGx2)+(Gx3*matGx3)+(Gx4*matGx4)+(Gx5*matGx5)+(Gx6*matGx6)+(Gx7*matGx7)+(Gx8*matGx8)+(Gx9*matGx9); // This is the process
Gy = (Gy1*matGy1)+(Gy2*matGy2)+(Gy3*matGy3)+(Gy4*matGy4)+(Gy5*matGy5)+(Gy6*matGy6)+(Gy7*matGy7)+(Gy8*matGy8)+(Gy9*matGy9);
datasob[(i*width)+j] = sqrt(powf(Gx,2)+powf(Gy,2)); // I put the new value into a new image
}
}
```

For more understanding here is what I do for the pixel located in the cell 55:

(matGx and matGy are int that i declared above)

Gx = ((value of cell 44) * matGx1) + ((value of cell 45) * matGx2) + ((value of cell 46) * matGx3) + ((value of cell 54) * matGx4) + ((value of cell 55) * matGx5) + … + ((value of cell 66) * matGx9);

Gx = ((value of cell 44) * matGy1) + ((value of cell 45) * matGy2) + ((value of cell 46) * matGy3) + ((value of cell 54) * matGy4) + ((value of cell 55) * matGy5) + … + ((value of cell 66) * matGy9);

Final value of the pixel 55 = sqrt(powf(Gx,2)+powf(Gy,2));

All of this is C++ and i would like to make it in a cuda kernel (send the greyscale array and receive the processed array).

In fact, i already try to make the double for-loop in the cuda kernel … i dont know how but i just broke my 8800 gtx so i’m a little bit scared about cuda.

can you give me some ideas to make this kernel?

Thank you and i hope i will be understood, if you want a clarification, ask me!