parallelize a code?can't make it work right

I am trying to parallelize a code but I am not getting the right results.
The code creates data which creates an image.

The c code is like:

J = 0;
Constants = 0;
	for ( RowIdx = 0; RowIdx < Rows; RowIdx++ )
	{

		
		*(theRe + J) = 0.0;
		*(theIm + J) = 1.0;

		++Constants;

		++J;
		P = J + 1;

for ( ColIdx = 1; ColIdx < Cols; ColIdx++ )
   *(theRe + J) =   *(thePh + J) * ....
    ++J;

} //ColIdx

....

} //RowIdx

Since we do not use at all the RowIdx or ColIdx ,but we use “J”, I tried:

J = threadIdx.y + blockDim.y * blockIdx.y;
Constants = 0;
	for ( RowIdx = 0; RowIdx < Rows; RowIdx++ )
	{

		
		*(theRe + J) = 0.0;
		*(theIm + J) = 1.0;

		++Constants;

		J+= gridDim.y * blockDim.y;
		P = J + 1;

for ( ColIdx = 1; ColIdx < Cols; ColIdx++ )

but I am not getting the right results.

Also , I do not understand if I must use :

RowIdx = threadIdx.y + blockDim.y * blockIdx.y;
for ( ; RowIdx < Rows; RowIdx  += gridDim.y * blockDim.y)

(and the same for column)

Because I thought that we aren’t using anywhere the RowIdx or ColIdx ,it isn’t required to express them like threads.

Try illiminating both for loops, these should no longer be necessary.

Every element on an index will be processed by a cuda core/kernel.

The problem is that these 2 loops are nested.And they access same variables.

If I leave them as it is,its is wrong because

J+= gridDim.y * blockDim.y;

will be in a loop also.

But how can I remove them? (updating the code)

Wouldn’t it be best to use 2D grid of blocks in which every thread do calculations for single pixel (or a tile of them) of the image? Could You elaborate what exactly You want to do, if it is not a secret? :)

Cheers,
MK

I tried to use

RowIdx = threadIdx.y + blockDim.y * blockIdx.y;
ColIdx = threadIdx.x + blockDim.x * blockIdx.x;

but then I don’t know how to handle “J” ans “P”. (if you look at c code).
J and P are neighboor values.

It is a little bit of secret indeed! :)

The code creates binary data and I can read them as image.
The code is repeating (the c code I have ).

Thanks