Registers and Locally declared Variables Variables declared in _global_ functions

rbjork · September 4, 2007, 2:51am

Are variables that are declared inside a kernal’s global function stored in local registers? For example is the variable “int bx” inside “multd” function of Cuda guides matrix multiplication example stored in local gpu register? Are there limits to the size of an array of floats that can be stored in register memory?

Simon_Green · September 4, 2007, 10:07am

Yes, local variables are usually stored in registers.

As described in the programming guide, there are a total of 8192 32-bit registers per multiprocessor on G80, and these are shared between all the thread blocks executing on the multiprocessor.

So if you had just a single block of 256 threads, you could have 32 float registers per thread.

The occupancy calculator is a useful tool here:
http://developer.download.nvidia.com/compu…_calculator.xls

The compiler generally tries to minimize the number of registers used. If you use too many registers it will start spilling values to local memory (which is much slower because it is off-chip).

If you index into an array of floats, then this will have to be stored in local memory because the hardware can’t index into the register file.

alex_dubinsky · September 4, 2007, 3:00pm

In my experience, arrays of floats get dumped to local memory and aren’t stored in registers. The compiler is very dumb about such things. I had to turn the array into regular variables and manually unroll all the loops using preprocessor tricks. (The compiler should have unrolled the static loops itself and realized that the array isn’t being dynamically indexed.)

Right now the compiler can’t be trusted. Always inspect the ptx/cubin it produces because it could be costing you an order of magnitude in performance. First-line optimization tricks such as trying to do blocking inside the kernel can actually lead to substantial performance losses as often as not if the compiler decides that what you really need is a trip to local memory every other instruction.

rbjork · September 4, 2007, 3:44pm

Thanks for the information. Though I have plenty of threads for warp and block swapping to hide device memory access latency, it may be prudent for me to use shared memory for arrays.

tsan · September 5, 2007, 1:12pm

I can confirm that something like

template <class T, unsigned int LEN> device void
my_function( … )
{

T my_array[LEN];

… compute something …

}

does in fact use local memory for my_array. Even for LEN = 1 and basic types T such as float2. I was quite surprised to see this as ‘T’ and ‘LEN’ are given at compile time and it should be fairly easy to assign registers insetad.

I was particularly surprised as I chose to take this approach for some kernel based on the following statement in the Cuda programming guide v. 1.0 sec. 4.2.2.4.

“An automatic variable declared in device code without any of these qualifiers
generally resides in a register. However in some cases the compiler might choose to place it in local memory. This is often the case for large structures or arrays that would consume too much register space, and arrays for which the compiler cannot determine that they are indexed with constant quantities.”

I guess the compiler doesn’t detect that much at its present state. So better read “often” as “always” above for now ;-)

– Thomas

Topic		Replies	Views
Register vs local memory Forcing NVCC to use registers CUDA Programming and Performance	4	3111	June 29, 2007
NUmber of register variables CUDA Programming and Performance	5	6045	February 10, 2011
Where are variables stored in device memory? Lessen the device register count CUDA Programming and Performance	1	1625	December 7, 2007
Local variables and registers CUDA Programming and Performance	13	6214	March 23, 2010
temporary memory issues CUDA Programming and Performance	11	5368	March 30, 2008
how to force a local variable to be placed in registers CUDA Programming and Performance	1	2222	December 23, 2009
Help me to understand Global vs Local Memory performance. CUDA Programming and Performance	19	24753	December 21, 2009
How to force declaring a variable as register? CUDA Programming and Performance	6	8176	April 21, 2008
Global memory vs register storage How to force the compiler to use registers? CUDA Programming and Performance	6	5011	July 3, 2009
Where are local variables stored? CUDA Programming and Performance	6	4249	August 17, 2010

Registers and Locally declared Variables Variables declared in _global_ functions

Related topics