compiler directive

Admirer4 · June 11, 2008, 6:07am

do anyone know what that dirictive do >
#pragma unroll

Sarnath · June 11, 2008, 6:36am

It unrolls the FOR loop that follows it immediately.

unrolling the FOR loop increases the useful instructions executed by FOR loop for every FOR branch

like consider:

for(i=0; i<100; i++)

{

cough_once();

}

100 FOR branches involved in this case…

Unrolling it by 2 makes the compiler generate code that is equivalent to:

for(i=0; i<50; i++)

{

cough_once();

}

50 FOR branches involved in this case…

Got it? Stop coughing now :-)

Admirer4 · June 11, 2008, 7:21am

thanks for your response , but i still didn’t know what that should do for the performance ?

for example , if i wrote :

A: for(i=0;i<10;i++) test();

or :

#pragma unroll 10

for(i=0;i<10;i++) test();

which will be like this:

B: test(); test(); … test(); (10 times)

so what’s the diffrence between A and B , why should i use this ?

and more than this , in most of the time’s i didn’t know how many time’s the loop will be executed, for example :

for(i=0; i<x (variable) ; i++) …

how i can use this directive ?

Sarnath · June 11, 2008, 7:32am

Hi,

As I already mentioned, the number of branches in the FOR loop is minimized by unrolling thereby increasing performance.

Branch instruction does NOTHING useful, right? So, by saying #pragma unroll 10, you are eliminating 10 branch instructions. That helps reduce pipeline stalls too, isnt it?

Thats the use of unrolling and it usually increases performance.

Whatever I am going to say from now is tentative:

When the loop iterates over a variable (say “n”) and if an unroll directive for “k” is provided –

usually the FOR loop is split into
a) A FOR loop of “n/k” times – which is unrolled “k” times inside
B) n%k times the for loop body is replicated

Check the manual OR check out the PTX assembly output.

Hongkong

Admirer4 · June 11, 2008, 10:46am

thanks alot for that , but another question , so if that is the result , why the compiler didn’t do that evrey time we wrote a loop ? , or is there’s some cases that we shouldn’t make unroll ?

Sarnath · June 11, 2008, 11:01am

Unrolling can cause your code to inflate. So, its kept as a programmer choice. You can use it @ ur discretion.

Also note that #pragma unroll has an additional argument that specifies the number of times you can unroll it.

Its totally up2 the programmer to use this feature.

seibert · June 11, 2008, 12:20pm

I believe the compiler does automatically unroll all loops whose size is known at compile time. (There may be some size limit here.) The directive is used to either disable loop unrolling, or to give the compiler unrolling hints for loops where the size is not known at compile time, but you (as the programmer) do. If you know the loop limit is a multiple of 32, then you can tell the compiler this with:

#pragma unroll 32

Sarnath · June 12, 2008, 7:24am

Noth really, I think. The compiler can unroll it even if it does NOT know how many times a FOR loop would get executed. It has that intelligence, I think. I remember doing some experiments to figure that out.

You could check out though! It does the division and remainder and unrolls accordingly…

Antartica & Hiroshima

Topic		Replies	Views
#pragma unroll? CUDA Programming and Performance	15	41692	March 21, 2008
loop unrolling CUDA Programming and Performance	7	1448	April 4, 2011
loop unrolling CUDA Programming and Performance	11	16965	January 31, 2008
Understanding unrolling and concurrent memory operations CUDA Programming and Performance	3	2918	July 7, 2015
automatic loop unrolling CUDA Programming and Performance	8	10991	July 2, 2009
#pragma unroll not behaving as expected CUDA Programming and Performance	1	472	September 10, 2022
Problem with unrolling loops CUDA Programming and Performance	9	8522	November 24, 2011
forcing loop unrolls CUDA Programming and Performance	4	614	October 11, 2018
Why using a break during a loop can save many register usage? CUDA Programming and Performance	10	4302	March 2, 2011
Why is loop unrolling so good? CUDA Programming and Performance	8	22517	November 8, 2007

compiler directive

Related topics