Why using a break during a loop can save many register usage?

bit_mapper · March 1, 2011, 4:27am

I’ve got a loop inside kernel. If the condition check fails during the loop, I use break to quit the loop. Using break unexpectedly save me 6 registers per thread. I don’t know why…

avidday · March 1, 2011, 11:10am

It will probably be the result of a compiler optimisation, although it is impossible to say more than that without seeing code. The obvious candidates are dead code removal or some sort of result or function substitution, but that is just pure speculation.

sergeyn · March 1, 2011, 4:17pm

Break disables loop unrolling. Thus less registers

bit_mapper · March 1, 2011, 7:04pm

The code relevant is as follow. I’m trying to stop the loop if negtive found. Actually the break is optional. The reason I want to add it is that as long as negtive found during the early iterations, we can stop and save time. But actually this also reports 6 registers when I compile.

shared unsigned char bv[16384];

shared int negtive_flag[512];

for(l = 0; l < 6; l++)

if (!(bv[(h[l]>>3)] & (1<<(7-(h[l]&0x7))))){

      negtive_flag[threadIdx.x] = 1;

      <b>break</b>;

 }

bit_mapper · March 1, 2011, 7:07pm

but what if during the loop, “break” is never reached? I mean the condition check never true all through the loops. The I guess it should use as many registers as the case when I don’t use break inside loop. How does the compiler know whether there is really a break that disables the loop unrolling before real run?

avidday · March 1, 2011, 7:09pm

Yep, Sergey nailed it. Without the break statement, that loop has a small, known trip count and nvcc will automatically unroll it. With the break in place, the loop trip count is unknown and the loop unroll won’t happen. That is where the register savings are coming from.

tera · March 2, 2011, 3:16am

On a side note, I still don’t understand why a break statement disables loop unrolling - wouldn’t it be trivial to implement with a jump out of the unrolled code?

bit_mapper · March 2, 2011, 6:26pm

Got ya~ So you mean the automatical unroll of loop with known trip count is finished by the optimized compiler beyond our consciousness. That’s why in this case, it makes no different if we manually unroll or not in terms of register usage.

for (i=0; i<6;i++)

A[i]=A[i]*10;

seemingly uses the same registers as

A[0]=A[0]*10;

A[1]=A[1]*10;

A[2]=A[2]*10;

A[3]=A[3]*10;

A[4]=A[4]*10;

A[5]=A[5]*10;

because it’s already been optimized to unroll during the compile. Is this correct understanding?

avidday · March 2, 2011, 6:38pm

Yep, that is pretty much it. You should be able to see the effect if you look at the compiled PTX from nvcc.

bit_mapper · March 2, 2011, 6:56pm

Thanks avidday! But how to see this different quantitatively from internal nvcc?

Unrolling the loops is actually increasing the use of registers. What I want to see is what’s the difference before unrolling and after unrolling quantitatively?

seibert · March 2, 2011, 11:28pm

You can disable loop unrolling by putting:

#pragma unroll 1

immediately before a loop you do not want to unroll. You can also do partial unrolling with the same option. See Appendix E.2 in the CUDA Programming Guide for details.

Topic		Replies	Views
Why is this loop using so many registers? CUDA Programming and Performance	7	1180	March 3, 2023
unexpected loop unrolling CUDA Programming and Performance	2	6776	May 3, 2010
Register usage problem CUDA Programming and Performance	7	2993	March 27, 2009
NVCC won't unroll for loop CUDA Programming and Performance	11	6323	February 18, 2011
#pragma unroll not behaving as expected CUDA Programming and Performance	1	536	September 10, 2022
Number of registers per thread massively increase on loops CUDA Programming and Performance	7	4288	May 1, 2010
How to force variables to be on a register, local memory or shared memory? CUDA Programming and Performance	6	7354	March 21, 2008
Regiser usage higher when using constant in loop CUDA Programming and Performance	5	4658	October 24, 2010
Incomprehendible register usage, once again CUDA Programming and Performance	3	2031	February 5, 2009
Prevent the compiler from unrolling loops CUDA Programming and Performance	2	262	November 11, 2024

Why using a break during a loop can save many register usage?

Related topics