I am a bit curious about defining variables mapped to registers inside if-statements compared to defining variables mapped to registers outside the if-statements:
code 1: global void kernel() {
int x, y, z;
… do something, but do not use ‘x’
if (y==0) {
…do something using ‘x’
}
}
code 2: global void kernel() {
int y, z;
… do something
if (y==0) {
int x;
…do something using ‘x’
}
}
Question is: in both cases I am using 3 registers ?, am I gaining something by using either of the two versions ?
Register allocation happens at a level below the C source code, so counting C variables tells you very little about the number of registers used on the hardware. You should answer your question directly with a few test kernels and nvcc --ptxas-options=-v
In a simple kernel that I just made up, I didn’t see a difference in the number of registers. I’m not sure if my finding generalizes to all cases though.
There should be no difference, as any state-of-the-art compiler will automatically reduce the variable lifetime to the bare minimum before register allocation, independent of the scope in the source code.
I compile it, and got this:
nv:~> nvcc --ptxas-options=-v regs1.cu
ptxas info : Compiling entry function ‘_Z4testPi’ for ‘sm_10’
ptxas info : Used 2 registers, 8+16 bytes smem, 4 bytes cmem[1]
It looks like I am using 8+16=24 bytes of shared memory and 4 bytes of constant memory, why is that ?. As it can be seen in my code I do not use shared nor constant memory, so the usage output I’m getting is quite confusing.
And finally, what is the meaning of cmem[1] ?, caz I’ve seen cmem[0] or cmem[16] also.