Question about divergence and loops

Hi everyone.
I Have just started learning about CUDA. I know that a warp (32 threads) is divergent if the control flow is different for any one of those threads. for example, an “if statement” that executes different code depending on the thread index.

My question is:
suppose i have a “for” loop in my kernel that is dependent on the input data for it’s number of iterations. (i.e. the number of iterations per thread is not the same for all threads in the warp). Would this be considered divergent??

would each thread have to wait, or would they run in parallel?

Thank you

( i am sorry if this question was already posted, i couldn’t find it)

Hi everyone.
I Have just started learning about CUDA. I know that a warp (32 threads) is divergent if the control flow is different for any one of those threads. for example, an “if statement” that executes different code depending on the thread index.

My question is:
suppose i have a “for” loop in my kernel that is dependent on the input data for it’s number of iterations. (i.e. the number of iterations per thread is not the same for all threads in the warp). Would this be considered divergent??

would each thread have to wait, or would they run in parallel?

Thank you

( i am sorry if this question was already posted, i couldn’t find it)

did i just ask a question so stupid that noone would even try to answer it?

did i just ask a question so stupid that noone would even try to answer it?

Yes, a loop that terminates at different times for threads in the same warp creates divergence. I’m not sure whether the threads that finish the loop first have to wait at the end of the loop for the rest of the threads in their warp. Different warps can get out of sync with no speed penalty.

Yes, a loop that terminates at different times for threads in the same warp creates divergence. I’m not sure whether the threads that finish the loop first have to wait at the end of the loop for the rest of the threads in their warp. Different warps can get out of sync with no speed penalty.

so the loops will be serialized? or is it just a matter of some loops finishing before others?

if it’s the latter, then there’s no problem, is there?

I could just place a barrier and wait, right?

Thank you for your answer.

so the loops will be serialized? or is it just a matter of some loops finishing before others?

if it’s the latter, then there’s no problem, is there?

I could just place a barrier and wait, right?

Thank you for your answer.