Hi,

Imagine I have 1024 blocks on GPU that each block has 1024 threads, and my GPU has 256 cores. due to my code :

```
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
float sum = 0.0;
for (int i = -10; i <= 10; i++)
for (int j = -10; j <= 10; j++)
{
int idx = ((row + j)) + (col + i);
sum += inout[idx];
}
int index = (row) + col;
output[index] = sum / (441);
```

Each core calculate one “sum” (variable *sum* within the loops) ? In other means in same time my GPU calculate 256 “sum” ?