Matrix by vector multiplication

Hello, i newbie in CUDA. My multiplication is not correct. I can’t find cause (spent several hours)

__global__ void multMatrixByVector(float *Matrix, float *Vector, float *resultedVector, int n)
{
	int x = threadIdx.x + blockDim.x * blockIdx.x;

	float sum = 0.0f;
	
	for(int i = 0; i < n; i++)
	{
		sum += Matrix[x * n + i] * Vector [i];
	}

	resultedVector[x]= sum;
}

and kernel:

dim3 grid(n/BLOCK_SIZE,1,1);
	dim3 blocks(BLOCK_SIZE,1,1);
	multMatrixByVector<<<grid, blocks>>>(dA, dB, dY, n);

Example:
Matrix:
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7

Vector:
3 3 3 3 3 3 3 3

Result:
12 60 108 156 84 0 0 0 <- wrong :(

Always check returned error codes. Apparently you are launching the kernel with 0 blocks, so it does nothing.

Thanks, I’ll remember that. :) But the launch of the kernel does not return cudaError_t

I checked. It’s ok :)

And program works correct if number of blocks - 1, but if more - rubbish in result

my bad :(((

filling matrix:

for(int i = 0; i < n; i++)
	{
		for(int j = 0; j < n; j++)
		{
			hA[i * BLOCK_SIZE + j] = i;	
			cout << hA[i * BLOCK_SIZE + j] << " ";
		}
		cout << endl;
	}

BLOCK_SIZE need replace by n

not carefully written ((