loop unrolling

small_potato · March 31, 2011, 5:25am

I have following code using loop unrolling:

#pragma unroll

for (int i=0;i<n;i++)

{

    ....

}

here if n is a defined constant, everything works fine. However, if n is a variable, performance dramatically reduced. I noticed roughly 3 times the instructions are issued and executed. Can anyone justify this? Alternative solution is welcomed as well.

avidday · March 31, 2011, 6:28am

With n constant, is it possible that the compiler can pre-compute the result of some or all of the result of the loop? That sort of optimization would explain the performance difference.

LSChien · March 31, 2011, 6:35am

what is occupancy?

could you list register usage by “nvcc -Xptxas -v -arch=sm_20 [source file]”?

Usually, compiler would use more registers to do loop unrolling.
By the way, if n is a constant, then compiler would unroll the loop automatically.

small_potato · March 31, 2011, 8:39am

My intention is to use loop unrolling to improve performance. I guess if it’s a compiler level technique, then I won’t be able to change the iteration cycle at run time.

small_potato · March 31, 2011, 8:42am

Occupancy is fine. I checked with visual profiler. The number of instructions issued increases 3 times, comparing to unrolled case. I am just looking for ways to do loop unrolling at run time. May be that’s just not feasible.

Sarnath · March 31, 2011, 8:47am

Hi There, Small Potato!

The compiler un-rolls even if “n” is not a compile time constant…
BUt you need to give the parameter for “unrolling”…, I think…
Try “#pragma unroll 5”… The compiler will generate code that will divide “n” by 5, and then un-roll appropriately.
You can check the PTX… That should tell u clearly what the compiler is doing…

igtrnt · April 4, 2011, 2:33pm

In my finite difference kernels for wave equations (very similar to FDTD3d from SDK) unrolling increases the performance.
However, manual loop unrolling produces even faster code: ptxas reports smaller register usage if I unroll manually compared to #pragma unroll

hamster143 · April 4, 2011, 6:29pm

Loop unrolling happens at compile time.

Topic		Replies	Views
Prevent the compiler from unrolling loops CUDA Programming and Performance	2	249	November 11, 2024
BUG? nvcc fails to unroll the loop CUDA Programming and Performance	6	6090	May 26, 2009
Does CUDA automatically unroll loops? CUDA Programming and Performance	4	5847	September 16, 2011
compiler directive CUDA Programming and Performance	7	6393	June 12, 2008
#pragma unroll? CUDA Programming and Performance	15	43131	March 21, 2008
forcing loop unrolls CUDA Programming and Performance	4	762	October 11, 2018
Loop unrolling CUDA Programming and Performance	3	2739	April 25, 2012
Understanding unrolling and concurrent memory operations CUDA Programming and Performance	3	3197	July 7, 2015
[bug?] #pragma unroll cannot make loop counter constant and does not enable constexpr CUDA Programming and Performance	7	1212	September 6, 2019
loop unrolling CUDA Programming and Performance	11	17191	January 31, 2008

loop unrolling

Related topics