Register usage and debug info in build

I’ve noticed that some kernel builds use more registers when debug information is built in (as might be expected), but other kernels use fewer registers when debug is built.

For example with no debug:

1> ptxas : info : Compiling entry function ‘Z25cuPopulateaRecStackKernelP6float2S0_S0_PbPfS2’ for ‘sm_35’
1> ptxas : info : Function properties for Z25cuPopulateaRecStackKernelP6float2S0_S0_PbPfS2
1> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1> ptxas : info : Used 41 registers, 368 bytes cmem[0]

Same kernel with debug:

1> ptxas : info : Compiling entry function ‘Z25cuPopulateaRecStackKernelP6float2S0_S0_PbPfS2’ for ‘sm_35’
1> ptxas : info : Function properties for Z25cuPopulateaRecStackKernelP6float2S0_S0_PbPfS2
1> 24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1> ptxas : info : Used 20 registers, 24 bytes cumulative stack size, 368 bytes cmem[0]

Why would a kernel use fewer registers just cause debug info is built in?

Why isn’t this consistent for different kernels?

What is the relationship between registers, stack frame and cumulative stack size?

I assume “debug info” means compiling with -G

Compiling with -G significantly alters code generation.

With altered code generation, it’s impossible to make blanket predictions about register usage. In general, compiling with -G usually generates more code, and more code could use more registers. But none of those are absolutes. -G may not generate more code. And more code may not always be an accurate predictor of increased register usage.

There’s simply no absolute relationships, so expecting consistency is a flawed assumption.

Thanks for your quick response.

What is the relationship between using 2x registers, but fewer stack and cumulative stack bytes?

Do the stack frame and cumulative stack bytes affect occupancy like register usage does?

It seems to me that you are picking random code generation statistics, putting them next to each other, and saying “what’s the relationship?” I can’t answer those questions, perhaps someone else can. I don’t think there is a direct relationship between register usage and stack usage.

Stack usage shouldn’t have an effect on occupancy like register usage or shared memory usage. Registers and shared memory are both shared resources provided by the SM (only). Stack is not a shared resource among threads (stack is separate to each thread) and the resources for stack are not limited to the SM.

Thanks txbob.