The number of registers

I don’t quite understand that how .cubin file get the number of registers?

Does it have any influence on the performance ?

if it does, then get more registers per block or less ,which is supposed to get high performance?


the more registers your kernel uses the less threads and as a consequence warps, blocks can be executed in parallel. So it might affect performance. If your kernel utilizes all arithmetic units and no threads wait for a memory transfer, there is no need to have more threads running.

The .cubin file states the number of register one thread uses. So divide 8192 (Regs per MP) by this number and you get the concurrent thread count. Same thing with the shared memory, but you have to take dynamically shared memory declared in you kernel into account, additionally.

Hope that helps


Thank you,and It’s helpful but please give out more information :) :(

I still don’t understand how does it affect the performance?

How to arrange it per block/thread?

the registercount you see is per thread. So with 8192 registers per multiprocessor, this can put a limit on the number of threads per multiprocessor. And with N threads per block, it can also give a limit on the number of blocks per multiprocessor.