I’m tyring to run a kernel that will multiply a matrix with an array.
__global__
void mult_matrix( float* a, float *b, float *c, int N )
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
int index = i + j*N;
if ( i < N && j < N )
c[j] = c[j] + a[index]*b[i];
}
I am not getting the right numbers though, and I suspect it’s the
kernel that’s not working because when I do this in my sequential
You’ve got a race condition where two threads with the same j value are both reading and writing the same c[j]. Since the order is undefined, you get a race and therefore the wrong value can be written.