Recyclation of variables stored in registers

Dalibor_CZ · January 23, 2013, 2:17pm

Hi,
I have a newbie question. If I have 2 arrays in kernel for example:

__global__ void Kernel(...){
double a [3];
double b [3];
...
}

I assume they are stored in registers. Now, the block of code with operations on array a is separated from block of code block with operations on b . Simply:

__global__ void Kernel(...){
double a [3];
double b [3];

OperationsOnA(a);
OperationsOnB(b);
}

Now the question is: Is the space in registers allocated for both arrays equal to 3sizeof(double) (space for further unused variables is recycled - space where a [] was is then occupied by b []), or = 6sizeof(double) (space for further unused variables is not recycled - a and b are independently stored in registers till the termination of kernel)?

Thanks,

Dalibor

DrAnderson42 · January 23, 2013, 6:46pm

Either and/or both. The register allocator is a very complicated beast and is combined with a lot of other heuristics in the compiler. In general, it is fairly aggressive at minimizing register usage.

This type of array declaration works fine in registers, but only if you ever index with compile time constants. If you index with a variable (that is not a loop unrolled index), they arrays will be dropped to local memory. I have one kernel with 2 10 element arrays and the sm_35 compiler puts both of them in registers. On sm_20, it spills them to local memory because the kernel goes over the 63 reg limit.

Dalibor_CZ · January 24, 2013, 7:34am

I got it, thanks for reply.
I am going to try, for the first time, the Nsight Eclipse for further register optimization. Is it a good tool for this task? Or is there any other method for register usage optimization.

I have a kernel where the , recyclation , of register space of unused variables is essential. That is why I ask.

DrAnderson42 · January 24, 2013, 6:30pm

You don’t need anything more than nvcc. It performs all of the register allocation with no input by the programmer. The only things you can do is force it to spill to local memory using the maxregcount argument, or by setting __launch__bounds (see the programming guide for info on that).