Hi All,
Could We know, which variable participate to increase the register count within a kernel? If Yes how?
Is there only two factor to increase occupancy( number of threads per block and register count ) ? If no, what are they?
Hi All,
Could We know, which variable participate to increase the register count within a kernel? If Yes how?
Is there only two factor to increase occupancy( number of threads per block and register count ) ? If no, what are they?
Registers may be counted after pre-compilation (iptx code), but usage may vary depending on your CUDA drivers.
And yes register usage is the blocking factor to launch more threads per MultiProcessor (MP) usually, with 8192 registers available for compute capability 1.2 and lower, 16384 registers for compute capability 1.3 devices. You must optimize register usage to be able to launch a maximum number of threads per MP.
PS: English is not my first language too, and I appreciate people that learn foreign languages and do the effort of expressing themselves in a forum common language, even if it’s not perfect. Don’t be turned-off by the above comment!
Shared memory is also a limiting factor to occupancy. Each MP shares 16k of shared memory, so all blocks occupying that MP need to split that up among themselves.