Register usage and .ptx files

I have been trying to reduce the number of registers used by my program. I thought perhaps I could figure out how the 42 registers reported in the .cubin file were being used by examining the .ptx file. However, when I examine the .ptx file I see significantly more than 42 registers reported at the beginning of the file. Is the .ptx file generated prior to register consolidation? Is there a better way to determine how the 42 registers are being used?

Thank you

The ptx generated by nvcc uses a new register for each new operand, this ptx is optimized when it is converted to GPU specific code.

No, it’s too bad but you can’t currently find out how the registers are exactly being used. The cubin format is top secret. Until NVidia releases the specs or someone reverse engineers it, we’re out of luck.

That’s unfortunate… Are the transformations that are performed on the PTX significant enough that it might make examining the PTX for performance bottlenecks useless? In my case I am trying to reduce the register count but my current strategy is to tweak one function after in the .cu file, recompile, examine the .cubin, and hope.