I meet problem when doing matrix-vec multiplication

I store the m x n matrix in 1-D, where m is the number of rows and n is the number of columns.
for instance:
2 x 6 matrix
A =
[ 1 2 3 4 5 6]
[ 2 3 4 5 6 7],
which is stored like A’=[1 2 3 4 5 6 2 3 4 5 6 7]
I have another vector, which is v=[1 1 1 1 1 1].

I’m trying to do this kind of matrix-vec multiplication. While, I always get the wrong results.

Could anyone help me?

The code is listed in the following

main.cu


dim3 threads(BLOCKSIZE, 1);
dim3 grid((int) ceil ((float)( n /BLOCKSIZE)), 1);

//kernel execution
matrixMul<<< grid, BLOCKSIZE >>>(d_eval, d_mat, d_vec, n, m);
// d_eval is the pointer to the storage of the final result, d_mat is the pointer to matrix, d_vec is the pointer to vector

===================

kernel.cu

#pragma once
#define small_grid_thread_id(void) ((__umul24(blockDim.x, blockIdx.x) + threadIdx.x))
#define large_grid_thread_id(void) ((__umul24(blockDim.x, blockIdx.x + __umul24(blockIdx.y,gridDim.x)) + threadIdx.x))

global void
matrixMul(float* d_eval, float* d_mat, float* d_vec, int n, int m){

unsigned int row = blockDim.x * blockIdx.x + threadIdx.x;


if(row >= glbelem) { return; }

for(unsigned int i = 0; i < m; m++){

	d_eval[n] += d_mat[ i * n + row] * d_vec[row];
	__syncthreads();
}
__syncthreads();

}

float dot = 0.0f ;
for(unsigned int col = 0; col < n; col++){

dot += d_mat[ col + row * n ] * d_vec[col]; // A is row-major
}

d_eval[row] = dot ;