I am trying to parallelize a code but I am not getting the right results.
The code creates data which creates an image.
The c code is like:
J = 0;
Constants = 0;
for ( RowIdx = 0; RowIdx < Rows; RowIdx++ )
{
*(theRe + J) = 0.0;
*(theIm + J) = 1.0;
++Constants;
++J;
P = J + 1;
for ( ColIdx = 1; ColIdx < Cols; ColIdx++ )
*(theRe + J) = *(thePh + J) * ....
++J;
} //ColIdx
....
} //RowIdx
Since we do not use at all the RowIdx or ColIdx ,but we use “J”, I tried:
J = threadIdx.y + blockDim.y * blockIdx.y;
Constants = 0;
for ( RowIdx = 0; RowIdx < Rows; RowIdx++ )
{
*(theRe + J) = 0.0;
*(theIm + J) = 1.0;
++Constants;
J+= gridDim.y * blockDim.y;
P = J + 1;
for ( ColIdx = 1; ColIdx < Cols; ColIdx++ )
but I am not getting the right results.
Also , I do not understand if I must use :
RowIdx = threadIdx.y + blockDim.y * blockIdx.y;
for ( ; RowIdx < Rows; RowIdx += gridDim.y * blockDim.y)
(and the same for column)
Because I thought that we aren’t using anywhere the RowIdx or ColIdx ,it isn’t required to express them like threads.