newbie question: variable loop length

I haven’t got a cuda-enabled card yet so I can’t try it out myself, but I have this question in mind while I read the manual:

Suppose there’s a loop within each thread and the bounds of the loop are dependent on the thread id. How does the complexity of each warp depend on its threads? Is the runtime of each warp dominated by the thread with the largest iterations or is there more dependencies?


As far as I know each warp gets scheduled as often as the longest-running thread of the warp needs.