Weird register count

global void emptyKernel()
{
}
global void testKernel()
{
int i = 0;
int j = 2;
i+=j;
}
Why do the two kernels above produce the same number of registers used - 2?
What registers does an empty kernel use? Are they the threadIdx and blockIdx?
The second kernel generates 2 registers also. Is it because i and j are eliminated in optimization? Where does the optimization take place? The generation of PTX or machine code?
Is there a way that would allow me to view/edit PTX code?

global void emptyKernel()
{
}
global void testKernel()
{
int i = 0;
int j = 2;
i+=j;
}
Why do the two kernels above produce the same number of registers used - 2?
What registers does an empty kernel use? Are they the threadIdx and blockIdx?
The second kernel generates 2 registers also. Is it because i and j are eliminated in optimization? Where does the optimization take place? The generation of PTX or machine code?
Is there a way that would allow me to view/edit PTX code?

If you’re using nvcc compiler, try “–keep” option to keep all the temporary files, including PTX.

About registers, my nvcc uses 1 register for a kernel calculating only 1000000 sines in a cycle. Empty kernel uses no registers.

If you’re using nvcc compiler, try “–keep” option to keep all the temporary files, including PTX.

About registers, my nvcc uses 1 register for a kernel calculating only 1000000 sines in a cycle. Empty kernel uses no registers.

Hey umod, I’m using toolkit version 3.2 on GF104. What’s your version? How did you get the register count? By visual profiler too?

Hey umod, I’m using toolkit version 3.2 on GF104. What’s your version? How did you get the register count? By visual profiler too?

Is this a problem with the visual profiler? Here’s the PTX code generated from emptyKernel

.entry _Z11emptyKernelv
{
.loc	16	17	0

$LDWbegin__Z11emptyKernelv:
.loc 16 20 0
exit;
$LDWend__Z11emptyKernelv:
} // _Z11emptyKernelv

Here’s the machine code generated in the cubin
00000000: 2800440400005de4 mov b32 $r1 c1[0x100]
00000008: 8000000000001de7 exit

I could only see 1 register, which is $r1. What is c1 there?

Is this a problem with the visual profiler? Here’s the PTX code generated from emptyKernel

.entry _Z11emptyKernelv
{
.loc	16	17	0

$LDWbegin__Z11emptyKernelv:
.loc 16 20 0
exit;
$LDWend__Z11emptyKernelv:
} // _Z11emptyKernelv

Here’s the machine code generated in the cubin
00000000: 2800440400005de4 mov b32 $r1 c1[0x100]
00000008: 8000000000001de7 exit

I could only see 1 register, which is $r1. What is c1 there?

I’m working under Linux with latest toolkit and drivers (3.2 and 260.19.21). I use “nvcc --ptxas-options=-v .cu” to see register usage per kernel during compilation:

ptxas info    : Compiling entry function '_Z4noopPi'

ptxas info    : Used 2 registers, 8+16 bytes smem

Kernel looks like this:

__global__ void noop(int *a)

  {

  a[threadIdx.x]=threadIdx.x;

  }

c1 and c2 are constant memory spaces

I’m working under Linux with latest toolkit and drivers (3.2 and 260.19.21). I use “nvcc --ptxas-options=-v .cu” to see register usage per kernel during compilation:

ptxas info    : Compiling entry function '_Z4noopPi'

ptxas info    : Used 2 registers, 8+16 bytes smem

Kernel looks like this:

__global__ void noop(int *a)

  {

  a[threadIdx.x]=threadIdx.x;

  }

c1 and c2 are constant memory spaces