BUG? nvcc fails to unroll the loop

sergeyn · May 24, 2009, 10:33pm

Hi,

I can’t make the loop in the following code to be unrolled,

__global__ void func(float4* _o)

{

  const int BLOCK_DIM_X = 512;

  #pragma unroll

  for (int i = BLOCK_DIM_X/2; i > 1; i /= 2)

  {

	_o[i] = make_float4(0,0,0,0);

  }

}

Any idea how to unroll it ? Seems like another bug.

Also the compiler fails to factor out make_float4(0,0,0,0), generating inefficient code.

Fugl · May 25, 2009, 12:27pm

You could unroll it by using the LOCAL_ITERATE macros of Boost::preprocessor. You can do some clever tricks with it. See [url=“http://forums.nvidia.com/index.php?showtopic=88814&hl=”]http://forums.nvidia.com/index.php?showtopic=88814&hl=[/url]

seibert · May 26, 2009, 3:55pm

According to the programming guide, the #pragma unroll directive will only unroll the loop if the compiler can figure out how many iterations it has. It is quite possible that the form of your loop counter is too complex for the compiler to infer the number of iterations. You can also put an explicit number after unroll, if you know the number of iterations will be a multiple of the unroll value.

Does this work?

_global__ void func(float4* _o)

{

  const int BLOCK_DIM_X = 512;

  #pragma unroll 8

  for (int i = BLOCK_DIM_X/2; i > 1; i /= 2)

  {

	_o[i] = make_float4(0,0,0,0);

  }

}

sergeyn · May 26, 2009, 4:18pm

Well, imho, it is just that compiler in general currently is in a pretty weak state. Constant propagation is one of the basic optimization techniques for compilers these days.

Hopefully it won’t take long to fix that.

seibert · May 26, 2009, 4:35pm

While I can agree with your general statement, I don’t see how this particular problem is a constant propagation issue. I’m not familiar with the innards of nvcc (which is based on the Open64 compiler), but I assumed that the loop unroller cannot figure out how many iterations this loop will have because the loop counter is advanced by repeated integer division (or hopefully bit shifting) rather than a simple increment/decrement operation. Do more mature compilers know how to unroll a loop like this?

sergeyn · May 26, 2009, 5:36pm

Microsoft’s shader compiler easily unrolls it.

seibert · May 26, 2009, 6:16pm

OK, I did some experimenting with the compiler, and discovered that the loop as written above is never unrolled, even if you give #pragma unroll an explicit unroll parameter. (It also does convert the integer division to bit shifting, as you would hope.) Something about that form of the loop is disabling the entire loop unroller, which I think is a definite bug in the case of the explicit unroll parameter, and a good feature request in the case of the generic #pragma unroll (especially given that the MS shader compiler can do it).

This code (while uglier) does unroll completely, and the compiler is smart enough to precompute the 1 << i values:

__global__ void func(float4* _o)

{

  const int BLOCK_DIM_X_LOG2 = 9;

  const int BLOCK_DIM_X = 1 << BLOCK_DIM_X_LOG2;

  #pragma unroll

  for (int i = BLOCK_DIM_X_LOG2 - 1; i > 0; i-=1)

  {

	_o[1 << i] = make_float4(0,0,0,0);

  }

}

This would have been shorter, but I couldn’t find a way to get the compiler to compute log2(X) at compile time.

Topic		Replies	Views
#pragma unroll CUDA Programming and Performance	20	6009	July 27, 2010
forcing loop unrolls CUDA Programming and Performance	4	804	October 11, 2018
NVCC won't unroll for loop CUDA Programming and Performance	11	6369	February 18, 2011
Unroll nested for-loops? CUDA Programming and Performance	1	4771	June 14, 2012
Problem with unrolling loops CUDA Programming and Performance	9	8796	November 24, 2011
#Pragma unroll doesn't work? CUDA Programming and Performance	8	6170	September 19, 2008
Loop unrolling not done? cannot deduce loop trip count CUDA Programming and Performance	2	1459	May 3, 2010
NVCC loop bug since cuda 5.5 CUDA Programming and Performance	5	1713	June 12, 2014
Cuda compiler loop unroll bug? CUDA Programming and Performance	14	2812	October 25, 2017
#pragma unroll not working? CUDA Programming and Performance	3	5036	June 8, 2009

BUG? nvcc fails to unroll the loop

Related topics