Hi,
Suppose that I have this variable in global memory on device:
int arrayLengths[512];
doLoop<<<1, 512>>>();
and I have this piece of kernel code:
__global__ void doLoop()
{
for (int i = 0; i < arrayLengths[threadIdx.x]; i++)
{
...
}
}
Here is my question:
With this code, each thread will run a loop with different number of iterations. If there are facts that:
a. Thread 1 gets 100 iterations,
b. Thread 2 gets 2 iterations,
c. Both thread 1 and thread 2 run in same warp.
What is the behavior of the scheduler?
Will the scheduler make thread 2 wait for thread 1 (so stream processor running thread 2 is idle)?
Or will thread 2 replaced with next-32-number thread in next warp?
I’m sorry if I’ve posted a vague question.
Thank you!