Maxreg count and benchmark assembly code

Mhm · May 16, 2012, 7:52pm

Dear all,
I am trying to compile my code with a limitation imposed on the number of registers available per for each thread. I am using -maxrregcount to impose this limit. After recompiling a simple benchmark I noticed that the registers used in the assembly code are beyond the maxrregcount value. for example if I assign -maxrregcount a value of 32 I still have some instructions with the register name of %r47,%r60 and …

also if I divide the total number of registers by the total number of threads for a certain benchmark, the number of register/thread is much less than the values used in the assembly code.My concern is why the register values are larger than the limit,

does anyone know why this is happening and how the register mapping is done

Thanks

seibert · May 16, 2012, 8:33pm

PTX is an assembly language for a virtual machine that is compiled down to the actual machine code of the GPU by ptxas. Final register assignment is done by ptxas, so the compiler emits PTX using static single assignment form:

Mhm · May 16, 2012, 10:11pm

Thanks,
Actually I am trying to analyze the register file usage in GPUs and I thought the assembly code can help but it seems it is more complicated than what I expect. So please guide me with the following concerns.

When I use the following option “–ptxas -options =-v” it gave me

ptxas info : Compiling entry function ‘_Z9matrixMulPfS_S_ii’
ptxas info : Used 14 registers, 2068+16 bytes smem, 4 bytes cmem[1]

this mean that it can manage running the thread with just 14 registers.So if I gave it just a register file with a 14 register it will run till the end without any problem. this mean that the same hardware register will be used as a register %r1 one time and later it will be used as register %r47(just random numbers). On the other hand, when I checked NVIDIA research papers I found that the “warp id& register number” combination is used to access and address the register file.

I am using GPGPU-sim and every time I print the register number it gives me registers from %r1 up to %r90 in some cases. So my concern is how to get the actual accessed hardware registers.

Thanks

tera · May 16, 2012, 10:43pm

As seibert said, don’t look at PTX code. Look at the actual machine code as dumped by [font=“Courier New”]cuobjdump -sass[/font].

Mhm · May 17, 2012, 6:34pm

Thanks, I generated the code and the register usage matches with the “ptxas -options =-v” result.

Topic		Replies	Views
Maxrregcount ? CUDA Programming and Performance	2	4758	September 19, 2009
Register usage How good is the compiler? CUDA Programming and Performance	6	3019	April 3, 2008
two questions about maxrregcount parameter of nvcc CUDA Programming and Performance	1	13691	July 27, 2010
Register Limit? Compilation to .cubin using local memory CUDA Programming and Performance	5	2447	December 11, 2008
How Should maxrregcount Be Properly Set? CUDA Programming and Performance	3	44	January 23, 2025
Register usage CUDA Programming and Performance	4	1078	March 13, 2012
Difference between the registers usage information showed in ptx file and cubin file CUDA Programming and Performance	4	1337	March 3, 2011
How is the number of required registers per thread counded? CUDA Programming and Performance	2	1486	November 20, 2009
Local variables and registers CUDA Programming and Performance	13	6170	March 23, 2010
ptxas optimization CUDA Programming and Performance	4	2894	January 9, 2009

Maxreg count and benchmark assembly code

Related topics