Maxreg count and benchmark assembly code

Dear all,
I am trying to compile my code with a limitation imposed on the number of registers available per for each thread. I am using -maxrregcount to impose this limit. After recompiling a simple benchmark I noticed that the registers used in the assembly code are beyond the maxrregcount value. for example if I assign -maxrregcount a value of 32 I still have some instructions with the register name of %r47,%r60 and …

also if I divide the total number of registers by the total number of threads for a certain benchmark, the number of register/thread is much less than the values used in the assembly code.My concern is why the register values are larger than the limit,

does anyone know why this is happening and how the register mapping is done


PTX is an assembly language for a virtual machine that is compiled down to the actual machine code of the GPU by ptxas. Final register assignment is done by ptxas, so the compiler emits PTX using static single assignment form:

Actually I am trying to analyze the register file usage in GPUs and I thought the assembly code can help but it seems it is more complicated than what I expect. So please guide me with the following concerns.

When I use the following option “–ptxas -options =-v” it gave me

ptxas info : Compiling entry function ‘_Z9matrixMulPfS_S_ii’
ptxas info : Used 14 registers, 2068+16 bytes smem, 4 bytes cmem[1]

this mean that it can manage running the thread with just 14 registers.So if I gave it just a register file with a 14 register it will run till the end without any problem. this mean that the same hardware register will be used as a register %r1 one time and later it will be used as register %r47(just random numbers). On the other hand, when I checked NVIDIA research papers I found that the “warp id& register number” combination is used to access and address the register file.

I am using GPGPU-sim and every time I print the register number it gives me registers from %r1 up to %r90 in some cases. So my concern is how to get the actual accessed hardware registers.


As seibert said, don’t look at PTX code. Look at the actual machine code as dumped by [font=“Courier New”]cuobjdump -sass[/font].

Thanks, I generated the code and the register usage matches with the “ptxas -options =-v” result.