Why won't my kernel work?

Hi everyone,

I’m tyring to run a kernel that will multiply a matrix with an array.

__global__

void mult_matrix( float* a, float *b, float *c, int N )

{

	int i = blockIdx.x * blockDim.x + threadIdx.x;

	int j = blockIdx.y * blockDim.y + threadIdx.y;

	int index = i + j*N;

	if ( i < N && j < N )

		c[j] = c[j] + a[index]*b[i];

}

I am not getting the right numbers though, and I suspect it’s the

kernel that’s not working because when I do this in my sequential

code, it works.

You’ve got a race condition where two threads with the same j value are both reading and writing the same c[j]. Since the order is undefined, you get a race and therefore the wrong value can be written.

you can try using “atomicAdd” to avoid this.

Obviously fixing the race condition is better code and performance wise.

atomicAdd should only be used in cases where there is absolutly needed and not in order to fix bugs in the code.

Take a look at the matrixMul or other similar samples in the SDK.

eyal