do anyone know what that dirictive do >
#pragma unroll
It unrolls the FOR loop that follows it immediately.
unrolling the FOR loop increases the useful instructions executed by FOR loop for every FOR branch
like consider:
for(i=0; i<100; i++)
{
cough_once();
}
100 FOR branches involved in this case…
Unrolling it by 2 makes the compiler generate code that is equivalent to:
for(i=0; i<50; i++)
{
cough_once();
cough_once();
}
50 FOR branches involved in this case…
Got it? Stop coughing now :-)
thanks for your response , but i still didn’t know what that should do for the performance ?
for example , if i wrote :
A: for(i=0;i<10;i++) test();
or :
#pragma unroll 10
for(i=0;i<10;i++) test();
which will be like this:
B: test(); test(); … test(); (10 times)
so what’s the diffrence between A and B , why should i use this ?
and more than this , in most of the time’s i didn’t know how many time’s the loop will be executed, for example :
for(i=0; i<x (variable) ; i++) …
how i can use this directive ?
Hi,
As I already mentioned, the number of branches in the FOR loop is minimized by unrolling thereby increasing performance.
Branch instruction does NOTHING useful, right? So, by saying #pragma unroll 10, you are eliminating 10 branch instructions. That helps reduce pipeline stalls too, isnt it?
Thats the use of unrolling and it usually increases performance.
Whatever I am going to say from now is tentative:
When the loop iterates over a variable (say “n”) and if an unroll directive for “k” is provided –
usually the FOR loop is split into
a) A FOR loop of “n/k” times – which is unrolled “k” times inside
B) n%k times the for loop body is replicated
Check the manual OR check out the PTX assembly output.
Hongkong
thanks alot for that , but another question , so if that is the result , why the compiler didn’t do that evrey time we wrote a loop ? , or is there’s some cases that we shouldn’t make unroll ?
Unrolling can cause your code to inflate. So, its kept as a programmer choice. You can use it @ ur discretion.
Also note that #pragma unroll has an additional argument that specifies the number of times you can unroll it.
Its totally up2 the programmer to use this feature.
I believe the compiler does automatically unroll all loops whose size is known at compile time. (There may be some size limit here.) The directive is used to either disable loop unrolling, or to give the compiler unrolling hints for loops where the size is not known at compile time, but you (as the programmer) do. If you know the loop limit is a multiple of 32, then you can tell the compiler this with:
#pragma unroll 32
Noth really, I think. The compiler can unroll it even if it does NOT know how many times a FOR loop would get executed. It has that intelligence, I think. I remember doing some experiments to figure that out.
You could check out though! It does the division and remainder and unrolls accordingly…
Antartica & Hiroshima