Nvidia’s white papaer [url]https://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf[/url] says 96KB of shared memory per SM, but Nvidia’s CUDA occupancy calculator (released as an excel file ) [url]https://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls[/url] says 65536 bytes if you put in sm_61 for the compute capability. So which is it?
Is it because devices with compute capability of sm_61 have varying shared memory sizes, and the occupancy calculator is giving me a conservative number? Or is it because, even if 96KB of hardware shared memory is available, about 30KB of it is unavailable for some overhead stuff?