I have a kernel with the following parameters
Since it uses 135 registers and since a SM has 65536 of registers (sm_80), I’d expect the max block size the kernel can run on is 485
But the profiler says that the max block size it can take is 384.
Appreciate if someone can help me understand why it cannot go beyond 384 of block size
The granularity of register allocation is larger than 1. The details differ by architecture. There may also be factors other than the number of registers that lead to the limit of 384 threads.
I do not know whether one can coax the details of the occupancy computation out of Nsight Computer, but the old Excel spreadsheet-based occupancy calculator, while deprecated, is still around and should allow you to track the exact details of the occupancy calculation, including the granularity of register allocation.
Thanks, you are correct. Looking at the excel sheet, I got to know that I have to consider the
Warp allocation granularity and
Register allocation unit size and
Register allocation granularity parameters of the architecture to do a proper calculation.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.