Registers and threads

I have a question concerning maximum number of registers and the number of threads.

my .cubin file states the following:


code  {

	name = raytracer_kernel

	lmem = 0

	smem = 44

	reg = 31

	bar = 0

	bincode  {

  0x1000cc09 0x0423c780 0xa0004c05 0x04200780 

  0x10000211 0x0403c780 0x20c3f003 0x00000780 

  0x10008808 0x1100ee0c 0x40070811 0x00000780 

  0x60060a11 0x00010780 0x30100811 0xc4100780 

  0x60060809 0x00010780 0xa000000d 0x04000780 

  0x2103f20c 0x20038418 0xa0000c09 0x44014780 

  0xb000040d 0x03f00003 0x10002609 0x2400c780 

  0xa0000409 0xc4004780 0x10000211 0x2400c780 


I have 16x16 threads (256) so I assume that the number of registers that will be used in total are:

16*16*31 = 7936

Am I right and where does the 31 come from? what will be put into the registers? Are that all the parameters from the kernel or is that something else?

Yep you’ve got it right.

Every variable you declare in the kernel (i.e. int a) will be put in a register. If the compiler can’t get the register usage down small enough, it will spill into lmem, but that rarely happens.

Arguments to the kernel call are shared in shared memory, along with BlockIdx, BlockDim, and GridDim (hence the non-zero smem in your example).

So if you use more than 32 variables inside your kernel you program will automatically take your local memory. It will not give some strange values?

You can certainly use more than 32 variables in your kernel. The compiler very aggressively reuses registers once you have used the value in them.

As an example, one of my kernels has it’s outer loop unrolled 27 times. Each inner loop probably uses ~10 variables (I’m guessing a little bit here) + all of the intermediate temporaries generated by performing math and addressing. So the total number of variables is at least 270 in this kernel, probably higher. But the compiler gets this down to reg=22.

basically what you can say is that all input to and output from functions (like sinf, powf, +, -, <, sqrtf) have to be in registers to be able to perform the functions on them.

but if you have

int a = 10;
int b = 3;

int c = a * b;

int d = c * c;

then the register used for a can be reused for c, and the register for b can be reused for d. I have actually been so stupid as to try to do that for the compiler by using
#define variable_used_later variable_used_in_the_beginning …

So if I’m getting this, you say that register are only used for computations not and do not hold same data over and over again. This makes sense now i know what it does :D tnx

This way I can predict myself how many registers I will be using or at least predict a little :)