WHILE LOOP

Hello, everyone.

 I am thinking whether it is possible to write a while loop in CUDA. I think there are many cases to use while loop for coding.

What confuses me is the following code :

void function( )
{
while( condition1 )
{

}

while( condition2 )
{

}
}

Suppose that I have 10 threads running “function”, thread 1 may get out of condition 1 while thread 2 is still in while loop1. Then thread 1 would

go to while loop2. So some threads would be in while loop1 and some threads would be in while loop2 ?

Then the performance would be reduced.

Does anyone know how to use while loop in CUDA appropriately ?

Either make sure or make it very likely that each warp follows the same path.

I can’t be more specific unless you state what your problem you’re trying to solve with CUDA looks like.

Either make sure or make it very likely that each warp follows the same path.

I can’t be more specific unless you state what your problem you’re trying to solve with CUDA looks like.

Thanks for your reply, I am trying to implement SQUFOF algorithm.

It is for integer factorization.

it has two stages corresponding to two while loops

once the two stages are finished, it might factor N.

I have thought to synchonize the threads to work together like this

while(condition 1)
{

}
sync_threads( );

while(condition 2)
{

}

But what I want is quite like that threads work independently. so it would be better without sync_threads( );

I don’t want the threads to wait for other threads. This is because once a thread doesn’t satisfy condition 2, it would find a factor of N.

So I am wondering what would happen if I just write the code as above( without sync_threads ). How is the performance ?

wiki has a brief description → [url=“Shanks's square forms factorization - Wikipedia”]http://en.wikipedia.org/wiki/Shanks’_squ...s_factorization[/url]

Thanks for your reply, I am trying to implement SQUFOF algorithm.

It is for integer factorization.

it has two stages corresponding to two while loops

once the two stages are finished, it might factor N.

I have thought to synchonize the threads to work together like this

while(condition 1)
{

}
sync_threads( );

while(condition 2)
{

}

But what I want is quite like that threads work independently. so it would be better without sync_threads( );

I don’t want the threads to wait for other threads. This is because once a thread doesn’t satisfy condition 2, it would find a factor of N.

So I am wondering what would happen if I just write the code as above( without sync_threads ). How is the performance ?

wiki has a brief description → [url=“Shanks's square forms factorization - Wikipedia”]http://en.wikipedia.org/wiki/Shanks’_squ...s_factorization[/url]

Hmm, I guess it depends how many threads are going to pass into condition 2 and how quickly.

A strategy that you might want to consider is to re-group your threads periodically (not too often, as

this means overhead). Maybe you can exchange thread thread state via shared memory for best

efficiency, then sort the state array by the “condition” they’re in, before all threads resume their work.

Hmm, I guess it depends how many threads are going to pass into condition 2 and how quickly.

A strategy that you might want to consider is to re-group your threads periodically (not too often, as

this means overhead). Maybe you can exchange thread thread state via shared memory for best

efficiency, then sort the state array by the “condition” they’re in, before all threads resume their work.