Scheduling Question Again


Suppose that I have this variable in global memory on device:

int arrayLengths[512];

doLoop<<<1, 512>>>();

and I have this piece of kernel code:

__global__ void doLoop()


   for (int i = 0; i < arrayLengths[threadIdx.x]; i++)





Here is my question:

With this code, each thread will run a loop with different number of iterations. If there are facts that:

a. Thread 1 gets 100 iterations,

b. Thread 2 gets 2 iterations,

c. Both thread 1 and thread 2 run in same warp.

What is the behavior of the scheduler?

Will the scheduler make thread 2 wait for thread 1 (so stream processor running thread 2 is idle)?

Or will thread 2 replaced with next-32-number thread in next warp?

I’m sorry if I’ve posted a vague question.

Thank you!

Threads get swapped in and out in warps. So, if there is control flow divergence within a warp, some threads will “idle.” Divergence is often caused by conditional statements, which loops contain at their core (think of a loop being controlled by an if-statment and a “jump”).