i want to parallelize two for loop with using cuda. My simple c++ code is:
index=0;
for (int m = 0; m < N; m++)
{
x[m] = (2 * m + 1) / (2 * N);
for (int n = 0; n < N; n++)
{
Z1[n] = (n + 1) + x[m];
Z[index] = Z1[n];
index++;
}
}
i write cuda version is that:
int tidx = blockIdx.x*blockDim.x + threadIdx.x;
int tidy = blockIdx.y*blockDim.y + threadIdx.y;
if (tidx < N && tidy < N)
{
x[tidx] = (2 * tidx + 1) / (2 * N);
Z1[tidy] = (tidy + 1) * 100 + x[tidx];
}
Z[tidx]=Z1[tidy];
}
When i run code. the first row of Z is correct and the others are equal to zerro. Z dimension is [nxn]. i dont understand where is the mistake? Can you help me thanks…