Max block size limiting factor


I have a kernel with the following parameters

Since it uses 135 registers and since a SM has 65536 of registers (sm_80), I’d expect the max block size the kernel can run on is 485

But the profiler says that the max block size it can take is 384.

Appreciate if someone can help me understand why it cannot go beyond 384 of block size

The granularity of register allocation is larger than 1. The details differ by architecture. There may also be factors other than the number of registers that lead to the limit of 384 threads.

I do not know whether one can coax the details of the occupancy computation out of Nsight Computer, but the old Excel spreadsheet-based occupancy calculator, while deprecated, is still around and should allow you to track the exact details of the occupancy calculation, including the granularity of register allocation.

1 Like

@njuffa ,

Thanks, you are correct. Looking at the excel sheet, I got to know that I have to consider the Warp allocation granularity and Register allocation unit size and Register allocation granularity parameters of the architecture to do a proper calculation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.