forcing loop unrolls

minidrive · October 10, 2018, 11:27pm

Hello,

Since the cuda compiler often shortens the length of the loop unrollled, is there a way i can force it?
My loop length, n , is really large 25,000,000

I use the following:

int i;
#pragma loop unroll
for(int j=0;j<N;j++)
i=j;

It is shortened to just 17 runs of N. How can i better understand why the compiler shortens it by so much?

thank you

njuffa · October 11, 2018, 3:29am

The CUDA compiler compiles with full optimizations by default. This includes dead code elimination. If you have a sequence of assignments (as you have in your loop):

i = 0;
i = 1;
...
i = N-1;

this is equivalent to simply

i = N-1;

BTW, the pragma as you wrote it doesn’t look correct to me. It should be

#pragma unroll <unrolling-factor>

saulocpp · October 11, 2018, 8:37am

If I am not mistaken, if the compiler doesn’t know the value of N beforehand, the unrolling is limited?
But then it is not really particular to CUDA, but to C/C++.

minidrive · October 11, 2018, 1:36pm

I am posting this from the NVIDIA programming guide:

“If no number is specified after #pragma unroll, the loop is completely unrolled.
if its trip count is constant, otherwise it is not unrolled at all.”

So if N is set to 25,000,000 , why would it not completely unroll?

Robert_Crovella · October 11, 2018, 2:15pm

if you are using

pragma loop unroll

that is incorrect

there is no loop

Furthermore, there is almost certainly some unstated limit to loop unrolling.

Topic		Replies	Views
Loop unrolling CUDA Programming and Performance	1	3364	January 21, 2008
BUG? nvcc fails to unroll the loop CUDA Programming and Performance	6	6088	May 26, 2009
#pragma unroll? CUDA Programming and Performance	15	43092	March 21, 2008
Unroll nested for-loops? CUDA Programming and Performance	1	4713	June 14, 2012
Loop unrolling CUDA Programming and Performance	3	2739	April 25, 2012
automatic loop unrolling CUDA Programming and Performance	8	11203	July 2, 2009
Unrolling of loops with strides _not_ equal to 1 CUDA Programming and Performance	2	691	January 19, 2015
compiler directive CUDA Programming and Performance	7	6388	June 12, 2008
Prevent the compiler from unrolling loops CUDA Programming and Performance	2	238	November 11, 2024
Problem with unrolling loops CUDA Programming and Performance	9	8712	November 24, 2011

forcing loop unrolls

Related topics