#pragma unroll

Deus · July 20, 2010, 1:20pm

Hello,

I have a problem with #pragma unroll, with this code part:

device void function(…,const int m_StrideDiv4 /(StrideDiv4>=6)/,…){

#pragma unroll 6

for(int i=0; i<m_StrideDiv4; ++i)

{
float	deltaA = f(...) * s[i];//f-device function

v[v2 + i] -= deltaA;
}

}

The compiler return a warning:

But why?

I use CUDA 3.1 and GTX480.

anfaenger · July 21, 2010, 9:01am

I noticed that too. Loops which were previously unrolled(2.3), are no longer unrolled?

Jimmy_Pettersson · July 21, 2010, 10:30am

template
device void function(…)

Deus · July 22, 2010, 9:57am

Thank you!

But I have a compiler error:

Why?

Jimmy_Pettersson · July 23, 2010, 7:10am

You probably aren’t calling the function with the right arguments. Also make sure your header file is updated according to your recent change.

const int m_StrideDiv4 = 6;
function<m_StrideDiv4 >(…);

Deus · July 23, 2010, 10:39am

The problem is, that m_StrideDiv(>=6) is calculated on the host and is only constatnt in this function.
Therefore, I can not define it as constant outside of the function.
What can I do to solve this problem?

tera · July 23, 2010, 1:17pm

How does the function f look like?

Deus · July 23, 2010, 4:15pm

The funktion f is arithmetic decoder…
Without f is the “#pragma unroll” successfully, but why?

Jimmy_Pettersson · July 24, 2010, 1:36am

Ok, so you know that m_StrideDiv >= 6 right?

Maybe:

__device__ void function(...,const int m_StrideDiv4 /*(StrideDiv4>=6)*/,...){

#pragma unroll

for(int i=0; i<6; ++i)

{

float	deltaA = f(...) * s[i];//f-device function

v[v2 + i] -= deltaA;

}

for(int i=6; i<m_StrideDiv4; ++i)

{

float	deltaA = f(...) * s[i];//f-device function

v[v2 + i] -= deltaA;

}

}

Would that solve your problem?

EDIT:

Ok i missed this post.

The reason is that f(…) takes in some arguments that are not known at compile time.

tera · July 24, 2010, 8:30am

Make sure f() has exactly one return statement.

Deus · July 24, 2010, 10:14am

Ok, I did it, but I have now a new error

tera · July 24, 2010, 10:36am

Do you use any pointers within f()?

Deus · July 24, 2010, 11:47am

Yes, I use pointer with f.
f is a decoder and it has a pointer of the coded data.

tera · July 24, 2010, 11:57am

Try marking them as restrict (see appendix E.3 of the Programming Guide) to indicate to the compiler that they don’t hamper with the loop counter.

I’m not sure though this is the problem. The compiler should still be able to note that the address of i is never taken.

Another question is, if f() is an expensive function, why would you want the loop to be unrolled?

Deus · July 24, 2010, 1:10pm

__restrict__was not a problem…

Are the unrolled loops not better for performance?

I can not understand it. f() is independent of m_StrideDiv4, why is f() problematically for the compiler?

tera · July 24, 2010, 1:35pm

You save an increment and a branch (and potentially a comparison), i.e. two or three instructions. That’s significant if the loop body itself has only one or a few instruction, but soon diminishes as the loop body gets larger.

Unrolling might open up possibilities for other optimizations, but that does not seem to be the case here.

The compiler has to make sure to produce code that is equivalent under any circumstances. That requires a lot of analysis. Can you post the code of f()?

The problem probably is not related to m_StrideDiv4 at all.

Jimmy_Pettersson · July 25, 2010, 3:26am

Maybe you could show us at least a code snippet.

tera · July 25, 2010, 8:49am

That’s not a problem - as long as f() has no other constructs preventing unrolling, the compiler will happily inline f() and still unroll the loop.

Deus · July 25, 2010, 11:57am

The code of f() is very large, with call of another functions and loops, it is a simple representation of f. Simple instructions are replaced with “…” :

__device__ unsigned int f(pointer of struct)

{	

	...

	decode(pointer of struct);

	...

	decode(pointer of struct);

	...

	decode(pointer of struct);

	...

	return ...;

}

__device__ unsigned int decode(pointer of struct)

{

	...

	for(int i=0; i<8; ++i)

	{

		...

	}

	...

	if (...){

		do {										  

			...

		} while (...);	   

	}		

	return ...;

}

Jimmy_Pettersson · July 25, 2010, 4:07pm

If those parameters affect the addressing it most definetly should. But it doesn’t have to be the definite reason why :)

if (...){  // <------- Problem ?

		do {										  

			...

		} while (...);	// <-------- Problem ? 

	}

I think those conditionals will be a problem if they depend on dynamic variables which means the compiler doesnt know which path to take.

Topic		Replies	Views
BUG? nvcc fails to unroll the loop CUDA Programming and Performance	6	6090	May 26, 2009
#Pragma unroll doesn't work? CUDA Programming and Performance	8	6126	September 19, 2008
pragma unroll error - Advisory: Loop was not unrolled, unexpected control flow construct CUDA Programming and Performance	3	1740	July 18, 2010
#pragma unroll get's ignored because of texture calls. Why? #pragma unroll causes Advisory: Loop CUDA Programming and Performance	3	2802	October 18, 2009
Extension cl_nv_pragma_unroll doesn't seem to work CUDA Programming and Performance	4	20218	October 12, 2011
#pragma unroll? CUDA Programming and Performance	15	43131	March 21, 2008
#pragma unroll not working? CUDA Programming and Performance	3	4992	June 8, 2009
Problem with unrolling loops CUDA Programming and Performance	9	8717	November 24, 2011
Different output of code when not unrolling loop CUDA Programming and Performance	16	1249	August 22, 2022
Unrolling of loops with strides _not_ equal to 1 CUDA Programming and Performance	2	693	January 19, 2015

#pragma unroll

Related topics