I am a bit curious about defining variables mapped to registers inside if-statements compared to defining variables mapped to registers outside the if-statements:

code 1:
global void kernel() {
int x, y, z;

… do something, but do not use ‘x’
if (y==0) {
…do something using ‘x’

code 2:
global void kernel() {
int y, z;

… do something
if (y==0) {
int x;
…do something using ‘x’

Question is: in both cases I am using 3 registers ?, am I gaining something by using either of the two versions ?


Register allocation happens at a level below the C source code, so counting C variables tells you very little about the number of registers used on the hardware. You should answer your question directly with a few test kernels and nvcc --ptxas-options=-v

In a simple kernel that I just made up, I didn’t see a difference in the number of registers. I’m not sure if my finding generalizes to all cases though.

There should be no difference, as any state-of-the-art compiler will automatically reduce the variable lifetime to the bare minimum before register allocation, independent of the scope in the source code.

Thanks guys, in fact there is no difference.

I wrote this small kernel to analyze register/memory usage:

#include “stdio.h”
#include “iostream”
#include “cuda.h”

using namespace std;

global void test(int *par) {

int y;
int x=*par;

if (x==1) {
    *par = y;


int main() {

int par;

cudaMalloc((void **) &par, sizeof(int));
return 0;


I compile it, and got this:
nv:~> nvcc --ptxas-options=-v
ptxas info : Compiling entry function ‘_Z4testPi’ for ‘sm_10’
ptxas info : Used 2 registers, 8+16 bytes smem, 4 bytes cmem[1]

It looks like I am using 8+16=24 bytes of shared memory and 4 bytes of constant memory, why is that ?. As it can be seen in my code I do not use shared nor constant memory, so the usage output I’m getting is quite confusing.

And finally, what is the meaning of cmem[1] ?, caz I’ve seen cmem[0] or cmem[16] also.