Hello, I was trying to compare the resource allocation of two kernels using the --ptxas-options=-v compiler option and got some strange results. Kernel 1 is much larger than kernel 2 and yet only shows as using 1 more register. I was suspicous of this so dumped out the ptx file, and things look very different.
–ptxas-options=-v -arch=sm_20 -O2: kernel 1 = 21 registers, 136 bytes cmem - kernel 2 = 20 registers, 128 bytes cmem
ptx file register allocations
kernel 1:
.reg .u32 %r<49>;
.reg .u64 %rd<56>;
.reg .f32 %f<81>;
.reg .f64 %fd<94>;
.reg .pred %p<8>;
kernel 2:
.reg .u32 %r<38>;
.reg .u64 %rd<42>;
.reg .f32 %f<62>;
.reg .f64 %fd<13>;
.reg .pred %p<5>;
I am no expert on this, but but looking at the ptx white paper I assume that this means kernel 1 is allocating a total of 288 ‘virtual’ registers and kernel 2 - 160? Are the 64 bit data actually using 2 registers each since cuda only has 32 bit registers? Is ptxas already accounting for a register spill? Any discussion on this topic would be most helpful. This is on a GTX 480 card.