I have this small portion of code that is looking at a large sparse matrix and finding how many non-zero values per row there are. This array, called d_mesh->nnzPerRow is initialized to all zeros before running this function.
Below is the code. When I run it in serial with just 1 thread (and 1 block) it works and gives the correct output for the first 5 rows : 3 4 3 4 5. When I try to run it in parallel by passing a numBlocks and numThreads parameters into the kernel function, it does not work and gives me an output of : 1 1 1 1 1.
Is there any way to run this in parallel where each thread can see the most recent value of d_mesh->nzxPerRow before updating it? Thanks.
__global__ void implicitNZPR(mesh * d_mesh)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;
int J;
int Nwidth = (d_mesh->Nx-2)*(d_mesh->Ny-2); //Matrix width/height
//PARALLEL (NOT WORKING)
/*
if(index < d_mesh->Ni)
{
int *A = d_getIJK3D(index, Nwidth, Nwidth, 0); //Get the I,J indices and offset by 1 since CPP starts at 0
J = A[1] + 1;
if(d_mesh->implicitA[index] != 0)
{
d_mesh->nnzPerRow[J] += 1;
}
}
*/
//SERIAL (WORKS AS INTENDED)
for(int i=0; i<d_mesh->Ni; i++)
{
int *A = d_getIJK3D(i, Nwidth, Nwidth, 0);
J = A[1] + 1;
if(d_mesh->implicitA[i] != 0)
{
d_mesh->nnzPerRow[J] += 1;
}
}
}