NUmber of register variables

If I understand correct, any variable I declare in a kernel is stored in a register. What happens if you define more variables than you have register memory available?

The compiler quietly converts some of the registers to local (device) memory and juggles them to keep active values in registers and less used values in local.

This has a pretty severe performance penalty on G80 and GT200, but GF100 (Fermi) has onchip caches so the actual overhead isn’t so bad at all.

There are other subtleties, especially involving indexable arrays not known at compile time, which are forced into local memory even if there are enough registers available.

If I have an array declared in a kernel like array[10][2] does this use 20 register variables? How can I find out the max number I can have in my kernel? I was seeing some issues earleir that were solved by removing some variable declarations in my code. However, when I compiled with --ptax-v I was able to see that the number of register variabls was less than I expected with my statically declared array, but when I used the amount of register variables it reported, multiplied times the number of threads in a block, it was still less than the number of register variables it says I am allowed per block in my deviceQuery.

Do you know what the proper way to figure out if you have too many is?

I found the answer to my problem:

There seems to be a compiler bug using local fixed small arrays. Compiler no doubt converts these to regs but does so incorrectly. the following gives wrong results with -O2 on gtx580 64 bit but fine with -g -G:

int a[9] = {0, 1, -1, -2, 2…},

sharedPtr[0] = b[a[0]];
sharedPtr[1] = b[a[1]];
.
.
.

I also confirmed that fixing the above array problem does NOT fix the atomics-with-shared-memory problem. I was hoping that I was writing into memory somewhere and all bets were off. Nope. No atomics with shared memory if -O2

I need to clarify the fixed array bug above. the initialization is dynamic, but the size is fixed and so known at compile time. I do not know if fixed initialization is ok but I bet it is:

int a[9] = {0, 1=width, -1, -2 * width, 2…},

sharedPtr[0] = b[a[0]];
sharedPtr[1] = b[a[1]];

wrong results with -O2, fine with -g -G