The CUDA occupancy calculator shows the
“Active Thread Blocks per Multiprocessor” metric.
Now we can tweak the shared memory/block and check how does it affect the above metric.
Surprisingly “Active Thread Blocks per Multiprocessor” never exceeds 3.
Does this mean NVIDIA G80 can never have more than 3 active thread blocks per SM.
Has anyone seen this experimentally?.