I store the m x n matrix in 1-D, where m is the number of rows and n is the number of columns.

for instance:

2 x 6 matrix

A =

[ 1 2 3 4 5 6]

[ 2 3 4 5 6 7],

which is stored like A’=[1 2 3 4 5 6 2 3 4 5 6 7]

I have another vector, which is v=[1 1 1 1 1 1].

I’m trying to do this kind of matrix-vec multiplication. While, I always get the wrong results.

Could anyone help me?

The code is listed in the following

## main.cu

…

…

dim3 threads(BLOCKSIZE, 1);

dim3 grid((int) ceil ((float)( n /BLOCKSIZE)), 1);

//kernel execution

matrixMul<<< grid, BLOCKSIZE >>>(d_eval, d_mat, d_vec, n, m);

// d_eval is the pointer to the storage of the final result, d_mat is the pointer to matrix, d_vec is the pointer to vector

===================

## kernel.cu

#pragma once

#define small_grid_thread_id(void) ((__umul24(blockDim.x, blockIdx.x) + threadIdx.x))

#define large_grid_thread_id(void) ((__umul24(blockDim.x, blockIdx.x + __umul24(blockIdx.y,gridDim.x)) + threadIdx.x))

**global** void

matrixMul(float* d_eval, float* d_mat, float* d_vec, int n, int m){

```
unsigned int row = blockDim.x * blockIdx.x + threadIdx.x;
if(row >= glbelem) { return; }
for(unsigned int i = 0; i < m; m++){
d_eval[n] += d_mat[ i * n + row] * d_vec[row];
__syncthreads();
}
__syncthreads();
```