Hi everyone,

I’m tyring to run a kernel that will multiply a matrix with an array.

```
__global__
void mult_matrix( float* a, float *b, float *c, int N )
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
int index = i + j*N;
if ( i < N && j < N )
c[j] = c[j] + a[index]*b[i];
}
```

I am not getting the right numbers though, and I suspect it’s the

kernel that’s not working because when I do this in my sequential

code, it works.