How to count registers used?

Hi, when I built my program, PTXAS info showed that 63 registers were used. Then, I made some changes to my program by eliminating a few automatic variables, which should be declared in register memory, and rebuilt. PTXAS info still showed that 63 registers were used.

I wonder if there is any way to see the list of all the 63 registers so that I can eliminate some of them.

Thanks a lot. Please help me.

You cannot obtain this list as there is no one-to-one correspondence between variables and registers. The best you can do is disassemble the resulting cubin file with nv50dis/nvc0dis and see what the registers are used for.
Potentially the information you are looking for should also be contained in the debugging info, but I’m not aware that it is documented anywhere (as even the assembler language isn’t).

As 63 registers is the maximum available on compute capability 2.x devices, it is quite likely that eliminating automatic variables will not reduce the number of registers used, but the amount of register spilling to local memory. Have a look at how much local memory your kernels use.
Also note that the compiler already is quite aggressive about eliminating variables and reusing registers, so trivial improvements to the source code quite likely won’t change the compiler output at all.

I have been struggling with the registers used, since I have used too many registers. In my program, I have a line: t = t/12.0, where t is an automatic variable of float type. I built my program and PTXAS info showed 89 registers used. However, if I changed the line to t = t/12, PTXAS info showed 83 registers used.

I wonder if I missed anything or it is a bug in compiler. Can anyone help me? Thanks a lot in advance.

How many registers does the kernel use with [font=“Courier New”]t *= 1/12.0[/font]? Division is an expensive operation that compiles to multiple instructions.

t *= 1/12.0 and t = t/12 have the same effect.

12 is an integer literal, 12.0 is a double. The results might be the same but there could be type conversions involved in the expression depending on the type of t, and doubles use 2 32 bit registers.

By C/C++ type promotion rules, if t is of type float, t / 12 maps to a single-precision division. The integer 12 is converted to float prior to the division. Since it is a literal constant the conversion should happen at compile time which can easily be verified by looking at the intermediate PTX.

By the same C/C++ type promotion rules, t / 12.0 maps to a double-precision division, since 12.0 is a double, and t thus is converted to the wider type prior to the division. The double-precision division is implemented as a software subroutine that requires more temporary registers than single-precision division, with the difference more pronounced if the single-precision division is approximate. An increase of six 32-bit registers as seen here seems perfectly plausible.

In order of increasing register pressure:

(1) approximate, limited range single-precision division, div.approx.f32
(2) approximate, full range single-precision division, div.full.f32
(3) IEEE-rounded single-precision division, div.rn.f32
(4) IEEE-rounded double-precision division, div.rn.f64