Compiler option --ptxas-options=-v gives wrong register count?

Hello, I was trying to compare the resource allocation of two kernels using the --ptxas-options=-v compiler option and got some strange results. Kernel 1 is much larger than kernel 2 and yet only shows as using 1 more register. I was suspicous of this so dumped out the ptx file, and things look very different.

–ptxas-options=-v -arch=sm_20 -O2: kernel 1 = 21 registers, 136 bytes cmem - kernel 2 = 20 registers, 128 bytes cmem

ptx file register allocations

kernel 1:
.reg .u32 %r<49>;
.reg .u64 %rd<56>;
.reg .f32 %f<81>;
.reg .f64 %fd<94>;
.reg .pred %p<8>;

kernel 2:
.reg .u32 %r<38>;
.reg .u64 %rd<42>;
.reg .f32 %f<62>;
.reg .f64 %fd<13>;
.reg .pred %p<5>;

I am no expert on this, but but looking at the ptx white paper I assume that this means kernel 1 is allocating a total of 288 ‘virtual’ registers and kernel 2 - 160? Are the 64 bit data actually using 2 registers each since cuda only has 32 bit registers? Is ptxas already accounting for a register spill? Any discussion on this topic would be most helpful. This is on a GTX 480 card.

Unlike the name suggests, ptxas is a full compiler that does it’s own register allocation and several optimizations on the PTX when compiling it into .cubin files. The register count given by ptxas is for the .cubin files, i.e. after the optimization.

To check those register numbers, you would have to run the .cubin file through decuda.

Thank You, that makes much more sense.

Thank You, that makes much more sense.