Hello all,

I just noticed something weird with regard to register usage when passing thread idx into a device kernel. Basically, I have a kernel, lets call it, main_kernel where I am trying to reduce the register usage. So i have a device kernel in my main_kernel, lets call it, sub_kernel where I was originally passing thread id as an argument, this was consuming around 246 registers/thread. But now when I explicitly declare the thread Id inside the device kernel instead of passing it, the register usage reduced to 240.

I am unable to understand whats really happening here, Id be super grateful if anyone can explain this to me.



Hi Srikanth,

Just a guess, but either the compiler or possibly ptxas (which does the register allocation) may be able to apply more optimizations in the case where the thread idx is declared locally.


Thats interesting. Thanks for the reply Mat.