What is the optimum way for loop handlings?

Manjunath_Gudisi · March 7, 2009, 9:23am

Hi,

I have written a of code that is in C++ as:

for( int i=0; i<m; ++i )
{
if(//some condition)
{

        for(int j=0; j<n; ++j )
       {
             for( int k=0; k< h; ++k )
             {
                  // some body here...
             }
        }
   }
   else  
   {
        for(int j=0; j<n; ++j )
       {               
             // some body here...                 
        }
   }

}

I want to write this code in CUDA but if we use loops like this which lead THREAD DIVERGENCE and degrade the performance of the CUDA program.

So , my question is what is the optimum way of handling this code?

Thanks
Manjunath G

Ailleur · March 7, 2009, 6:19pm

If m,n and h are the same for every thread, then this will limit a part of the divergence.

To make it perfect, every warp of threads should agree on which condition is taken.

The key is that every thread in a warp needs to execute the same instruction, so if your condition is warp dependant, then this should be fine.

If not, then the warp will have to be split in multiple “sub warps” which will leave some of the scalar processors idle, thus hurting performances.

I would first try it as is, then try it with a fake “optical” case to see if it is actually worth the trouble of possibly rearanging things to make the condition warp dependant.
It could very well be that you will be that this divergence will not be the limiting factor of your algorithm.

Manjunath_Gudisi · March 9, 2009, 5:35am

My m,n and h are not same.

so sub-wraps is the solution for this??

Topic		Replies	Views
Question about divergence and loops CUDA Programming and Performance	7	7117	November 21, 2010
threads diverging in a loop when does a loop cause divergance CUDA Programming and Performance	13	21003	May 12, 2007
thread local 'for loop' question thread parallel for loop execution CUDA Programming and Performance	5	3423	August 29, 2007
Avoid branching ... CUDA Programming and Performance	3	3641	May 19, 2010
Problem on thread divergance. CUDA Programming and Performance	1	2273	March 23, 2009
Must all threads execute the same code? "Branch divergence occurs only within a warp" CUDA Programming and Performance	5	2995	December 28, 2008
Wacking the CUDA performance Is this how you can screw up you CUDA CUDA Programming and Performance	16	21302	March 12, 2007
Scheduling Question Again CUDA Programming and Performance	1	1698	June 26, 2007
Loops in kernels CUDA Programming and Performance	2	1359	September 3, 2009
divergent branches how to change it? CUDA Programming and Performance	1	2716	April 26, 2009

What is the optimum way for loop handlings?

Related topics