Question about occupancy Calculator data

The CUDA occupancy calculator shows the
“Active Thread Blocks per Multiprocessor” metric.
Now we can tweak the shared memory/block and check how does it affect the above metric.
Surprisingly “Active Thread Blocks per Multiprocessor” never exceeds 3.
Does this mean NVIDIA G80 can never have more than 3 active thread blocks per SM.

Has anyone seen this experimentally?.

this depends also on number of registers per thread & number of threads per block
If you have 256 threads per block, you can have at most 3 thread blocks, since each multiprocessor can have at most 768 threads active at one time. When using 512 threads per block, you can have only 1 block per multiprocessor.

It can have upto 8 blocks per SM, so long as you have the resources (total registers, total threads per SM, and shared memory resources should you be using them) to call upon.

The moment the GPU hasnt got the necessary resources to keep it all in the SM, it’ll load as much as it can, hence ‘active thread blocks’ vs actual threadblocks generated, per SM.

Nwm problem solved

Anyone ?