How do you calculate the number of registers per thread?

Ionica · September 13, 2010, 3:57pm

Hi!

As the number of registers per thread can become a limiting factor for occupancy, I want to know how many registers my kernel usesâ€¦

Iâ€™ve read the programming guide and the best practices guide and what Iâ€™ve understood from there is that all automatic variables are placed in registers (unless they consume too much memory, case in which they are placed in local memory).

So I have the following kernel (simple matrix multiplication kernel):

__global__ void mulMatrixKernel( float* g_matrix_A, float* g_matrix_B, float* g_matrix_C, int rows, int cols) 

{

  // access thread id

  const unsigned int row = blockIdx.y*TILE_DIM+threadIdx.y;

  const unsigned int col = blockIdx.x*TILE_DIM+threadIdx.x;

  float sum=0.0f;

//perform computation

  if(row<rows && col<cols)

	  for(int i=0;i<rows;i++)

		  sum+=g_matrix_A[row*cols+i]*g_matrix_B[i*cols+col];

g_matrix_C[row*cols+col]=sum;

}

I would say that I have three registers here: row, col, sum. But according to the visual profiler I have 9. Now that is a big difference.

I have another question related to this problem: where are the intermediate results of computations stored? (for example in the upper kernel: blockIdx.y * TILE_DIM + threadIdx.y). And one last question also related to this problem: where are the variables threadIdx, blockIdx, blockDim and gridDim stored and which is the latency for reading these variables?

vvolkov · September 14, 2010, 11:31pm

Loop counter i and pointers g_matrix_A+rowcols+i and g_matrix_B+icols+col will also be in registers. Some intermediates too. My understanding is that variables like threadIdx are available in special-purpose registers.

If you really want to understand how registers are used in your kernel, use a disassembler such as decuda.

Vasily

vvolkov · September 14, 2010, 11:31pm

Loop counter i and pointers g_matrix_A+rowcols+i and g_matrix_B+icols+col will also be in registers. Some intermediates too. My understanding is that variables like threadIdx are available in special-purpose registers.

If you really want to understand how registers are used in your kernel, use a disassembler such as decuda.

Vasily

Topic		Replies	Views
Understanding Register Count in Visual Profiler CUDA Programming and Performance	9	4347	June 11, 2010
How to determine number of register per thread How to determine number of register per thread from a CUDA Programming and Performance	4	13610	May 13, 2021
Registry per thread material CUDA Programming and Performance	4	1004	November 19, 2012
Understanding of Registers/Block entry of the profiler CUDA Programming and Performance	2	492	January 22, 2019
Cuda Occupancy and Register usage CUDA Programming and Performance	6	6004	June 11, 2009
How to determine register usage CUDA Programming and Performance	3	2159	December 14, 2010
Registers and threads CUDA Programming and Performance	5	5357	March 20, 2008
Is it possible to use more than 124 registers in kernel? CUDA Programming and Performance	15	4304	October 16, 2009
Built-in Variables Memory Location ? in which memory are built in variables stored CUDA Programming and Performance	3	5899	September 9, 2011
How is the number of required registers per thread counded? CUDA Programming and Performance	2	1568	November 20, 2009

How do you calculate the number of registers per thread?

Related topics