Hello, i newbie in CUDA. My multiplication is not correct. I can’t find cause (spent several hours)

```
__global__ void multMatrixByVector(float *Matrix, float *Vector, float *resultedVector, int n)
{
int x = threadIdx.x + blockDim.x * blockIdx.x;
float sum = 0.0f;
for(int i = 0; i < n; i++)
{
sum += Matrix[x * n + i] * Vector [i];
}
resultedVector[x]= sum;
}
```

and kernel:

```
dim3 grid(n/BLOCK_SIZE,1,1);
dim3 blocks(BLOCK_SIZE,1,1);
multMatrixByVector<<<grid, blocks>>>(dA, dB, dY, n);
```

Example:

Matrix:

0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

4 4 4 4 4 4 4 4

5 5 5 5 5 5 5 5

6 6 6 6 6 6 6 6

7 7 7 7 7 7 7 7

Vector:

3 3 3 3 3 3 3 3

Result:

12 60 108 156 84 0 0 0 <- wrong :(