Hi,

Could anybody explain how to optimize the following code:

```
__device__ float applyKxTranspose(float *zeta, float fy, int M, int N, float *result, int x, int y, int index)
{
float zeta_index = zeta[index]/fy;
if (x==0)
{
result[index] = zeta_index;
return;
}
float sum = zeta_index;
if (x==N-1)
sum += -zeta_index;
sum+= -zeta[index-1]/fy;
if (x==N-2 )
sum += zeta[index+1]/fy;
result[index] = sum;
}
```

where zeta = 640x480, M = 640, N=480, result=640x480, x and y - current thread + block index

I guess, it can be shared memory, but I don’t have any experience with it.

Maybe there are other things that can be improved?

Could you give me any tips?