Question about DO LOOP

Hello.

I have two simple questions.

If I have a do loop like following:

DO ir = cuRotRay%PinRayIdxSt(k), cuRotRay%PinRayIdxSt(k + 1) - 1

ENDDO

The upperbound and lowerbound are cuRotRay%PinRayIdxSt(k) and cuRotRay%PinRayIdxSt(k + 1) - 1.

Are they read only once before starting of the loop or are they read every cycle of the loop?

This might be important in CUDA global memory optimization.

Does CUDA ILP(Instruction Level Paralleism) apply to do loop in the thread also? The do loop doesn’t have data dependency between each cycle.

Hi CNJ,

The lower bound will read only once. The upper bound will be read each time through the loop. However, the upper bound may be put into a local register rather than re-load from memory. Though, you might want to store the ubound in a local variable and use this variable in the do.

ILP would apply to the instructions within the loop inside the kernel but not necessarily across iterations of the loop.

  • Mat

So if I have a do loop

DO i = 1, 5
a(i) = i
ENDDO

do I have to make it in the following form or do some unrolling (unrolling may be done by compiler) to take advantage from ILP?

a(1) = 1
a(2) = 2

a(5) = 5

do I have to make it in the following form or do some unrolling (unrolling may be done by compiler) to take advantage from ILP?

The compiler may do some unrolling depending upon the code and the optimization used. You can try using “-Munroll” with it’s various sub-options to see if it helps. Note that “-Munroll=c:1” is enabled by default with using “-fast”.

  • Mat