I need to avoid bank conflicts coping a window from a matrix in global memory to a vector in shared, something like this:

//needed to zeroes all the elements of the vector

[codebox]*(subvector+ thidx)=0;

if(thidx<colwind)

```
{
for(n=0;n<rowwind;n++)
{
*(subvector + (n * colmatrix) + thidx)=*(in + ((colstep*blockCol)+thidx)*rowimg + ((rowstep*blockRow)+n));
__syncthreads();
}
}[/codebox]
```

this code manages a moving windows over a matrix;

colstep and rowstep represent how much window has to move in the x,y direction

Thanks