Hi,
I have some questions about register count in the NCU output. I have profiled a specific kernel name which I see repeated invocations for two grid sizes in the summary tab.
This tab shows the maximum number of registers per thread is 168.
However, when I check the source tab, I see that maximum live register is about 68.
I am confused about that blue bar.
1- I assume that the source information is “per thread”. Am I right?
2- If the max register per thread is 168, then the 68 live registers is 40% of registers. Then, that blue bar should show 40%. Isn’t that true?
The blue bar shows the value of the current line in relation to the maximum value in this column. This is not in relation to another metric somewhere else in the report.
The registers per thread allocated by the compiler for the function are not directly tied to the live registers computed per instruction in the source view. This is documented here:
The total number of registers reported as launch__registers_per_thread may be significantly higher than the maximum live registers. The compiler may need to allocate specific registers that can creates holes in the allocation, thereby affecting launch__registers_per_thread , even if the maximum live registers is smaller. This may happen due to ABI restrictions, or restrictions enforced by particular hardware instructions. The compiler may not have a complete picture of which registers may be used in either callee or caller and has to obey ABI conventions, thereby allocating different registers even if some register could have theoretically been re-used.