Use of register An odd problem

athlonshi · August 12, 2010, 6:44am

Hi,

I encountered an odd problem about the use of register in my kernel function.

My kernel function currently has 46 registers, which seems large as the occupancy will be low. So, I wanted to reduce the number of registers.

I assume that the number of registers comes from the local variables that I defined in the kernel function. But whatever I decreased or increased the number of the locally defined variables, the total register number kept the same. That is odd ~~~. Did I miss something here?

Then, I tried a silly method to identify which part of code used the register and surprisingly found that once I disabled one line, which assigns the value of a local variable to a global array element (the array reside in page-locked memory), the number of used register became 0. This is even more odd, isn’t this? ~~

I was totally confused about what I saw here. Is there clue ? Thanks!

P.S. my compiling command to detect the register number is “nvcc -c -O3 -arch sm_13 --ptxas-options=”-v" "

Deus · August 12, 2010, 9:22am

It is possible, that the compiler optimisation create the local variables.

Deus · August 12, 2010, 9:22am

It is possible, that the compiler optimisation create the local variables.

cbuchner1 · August 12, 2010, 10:06am

you removed the code that “…assigns the value of a local variable to a global array element…”
Let me guess: that was the result of the computation.

There’s a feature called dead code optimization that concluded that your entire kernel was now
performing work that was unnecessary (as the result was never stored) - so the entire kernel
was optimized away. → 0 registers.

cbuchner1 · August 12, 2010, 10:06am

you removed the code that “…assigns the value of a local variable to a global array element…”
Let me guess: that was the result of the computation.

There’s a feature called dead code optimization that concluded that your entire kernel was now
performing work that was unnecessary (as the result was never stored) - so the entire kernel
was optimized away. → 0 registers.

Eric3918 · August 12, 2010, 12:20pm

I have not found anyone able to explain how the compiler decides to create registers. The majority of your registers are likely created by the compiler and not variables which you explicitly created in the kernel. Even worse, it seems like, similar to what you’ve experienced, if I remove a variable I explicitly created, the compiler often decides it can create another just to drive me nuts. I would love to be able to know exactly what is going into the register, when and where. You can limit registers using a command at compile but it seems to just dump them into local memory, which is horrible.

Bottom line, there must be a way to write code in a way which the compiler does not feel the need to use so many registers. But I haven’t found anybody who knows how.

Eric3918 · August 12, 2010, 12:20pm

I have not found anyone able to explain how the compiler decides to create registers. The majority of your registers are likely created by the compiler and not variables which you explicitly created in the kernel. Even worse, it seems like, similar to what you’ve experienced, if I remove a variable I explicitly created, the compiler often decides it can create another just to drive me nuts. I would love to be able to know exactly what is going into the register, when and where. You can limit registers using a command at compile but it seems to just dump them into local memory, which is horrible.

Bottom line, there must be a way to write code in a way which the compiler does not feel the need to use so many registers. But I haven’t found anybody who knows how.

cbuchner1 · August 12, 2010, 12:39pm

–maxregcount = N (trade registers against slow local memory access)
use of the “volatile trick” (search the forums)
limit the scope of local variables, recompute index variables within local scopes,

try to break up your algorithm into separate functional blocks - each within a separate

local scope (e.g. curly brackets) even if it means more memory access to re-load data.
use as much shared memory as is available (trade registers vs. shared memory)
split your algorithm into several smaller kernels

Points 2) to 4) helped me to get a critical and relatively complex algorithm for radio interference

simulation below the 16 register limit so I can run 512 threads on Compute 1.1 hardware.

Before doing this optimization I had around 20 registers.

Getting down from 46 registers to 32 may be NP-hard ;)

Christian

cbuchner1 · August 12, 2010, 12:39pm

–maxregcount = N (trade registers against slow local memory access)
use of the “volatile trick” (search the forums)
limit the scope of local variables, recompute index variables within local scopes,

try to break up your algorithm into separate functional blocks - each within a separate

local scope (e.g. curly brackets) even if it means more memory access to re-load data.
use as much shared memory as is available (trade registers vs. shared memory)
split your algorithm into several smaller kernels

Points 2) to 4) helped me to get a critical and relatively complex algorithm for radio interference

simulation below the 16 register limit so I can run 512 threads on Compute 1.1 hardware.

Before doing this optimization I had around 20 registers.

Getting down from 46 registers to 32 may be NP-hard ;)

Christian

athlonshi · August 12, 2010, 12:42pm

Thanks all!
Yes, it is driving me crazy as I don’t have any way to control the register that I use.
I want to have the locally defined variables resided in registers as they are frequently accessed.
The disabled line which drastically reduces the total register from 46 to 0 is the last line of my code, which returns the local variable value. It looks like the compiler is smart enough to detect that my kernel function is just a “dead code” as there is no returned value, so all local variables do not need to reside in register.
I tried to use decuda, but feel hard to interpret the output. Too many tricks to figure out ~~~ :wacko:

athlonshi · August 12, 2010, 12:42pm

Thanks all!
Yes, it is driving me crazy as I don’t have any way to control the register that I use.
I want to have the locally defined variables resided in registers as they are frequently accessed.
The disabled line which drastically reduces the total register from 46 to 0 is the last line of my code, which returns the local variable value. It looks like the compiler is smart enough to detect that my kernel function is just a “dead code” as there is no returned value, so all local variables do not need to reside in register.
I tried to use decuda, but feel hard to interpret the output. Too many tricks to figure out ~~~ :wacko:

cbuchner1 · August 12, 2010, 12:52pm

If you are at liberty to post your kernel code as standalone compilable .cu file I may be able to hack it a bit and maybe lower the register use.

cbuchner1 · August 12, 2010, 12:52pm

If you are at liberty to post your kernel code as standalone compilable .cu file I may be able to hack it a bit and maybe lower the register use.

Topic		Replies	Views
Is it possible to use more than 124 registers in kernel? CUDA Programming and Performance	15	4153	October 16, 2009
Incomprehendible register usage, once again CUDA Programming and Performance	3	1972	February 5, 2009
Number of registers CUDA Programming and Performance	6	2115	March 24, 2009
reducing the number of used registers CUDA Programming and Performance	8	6329	September 22, 2009
Local variables and registers CUDA Programming and Performance	13	6214	March 23, 2010
how to reduce registers in each kernel CUDA Programming and Performance	2	1130	November 4, 2009
How to care registers? CUDA Programming and Performance	5	2994	July 8, 2009
Reducing register usage CUDA Programming and Performance	1	1133	October 3, 2009
Force a variable to be stored in a Register Is there any way to ensure a variable CUDA Programming and Performance	13	9042	May 21, 2010
Understanding how the compiler assigns registers Checking the .cubin file CUDA Programming and Performance	4	3349	November 10, 2008

Use of register An odd problem

Related topics