Hello!
I recently found that difference between theoretical occupancy and achieved occupancy is quite a lot. Since occupancy means capability of holding thread block, I wonder what could be the reason of difference between theoretical occupancy and achieved occupancy. It would be nice to informed the reasons of it.
I would appreciate the link of article mentioning this problem.
The Achieved Occupancy rule provides guidance for differences between achieved and theoretical occupancy. It is executed and reported automatically when you collect the Occupancy section, or a set that includes this section (the default set of ncu does). It’s output is shown on the Summary and Details page, both in the CLI and UI. You can use the PmSampling section (--section PmSampling or --set full) to understand the dynamic behavior of your workload, assuming your setup is supported by PM Sampling
.