Does CUDA support variable loop limits?

67rtyus · October 11, 2011, 11:59pm

Hi;

I am about to start to optimize a piece of code by using CUDA. I am a newbie to it, so I am stuck with a simple question at the moment. In nearly all examples I have seen about CUDA, the limits of the for loops are passed to kernels from the main code; such that it is known how many times a for loop in a kernel will iterate, before calling the kernel. What if we can only determine the iteration count at the runtime of the kernel?

For example:

…Kernel Code…
int i;
int loopLimit=Some_Value_Calculated_Inside_of_the_Kernel;

for(i=0;i<loopLimit;i++)
{
//Do Something
}
…Kernel Code…

Since the variable loopLimit can take different values in every different thread in a block, the threads would diverge since they would iterate the for loop in different amounts. How does CUDA handle that, or does it even support it?

Thanks in advance.

Akg · October 12, 2011, 4:51am

it should be ok, if the loopLimit variable is same for threads in a warp. other wise you’ll get warp divergence

Jimmy_Pettersson · October 12, 2011, 8:32am

Yes that’s fine. Warp serialization due to divergent loopLimits might hamper performance but it’s not a showstopper.