maximum threads per block

Hi all,

Is there any function that can be used to get the maximum_threads_per_block for a specific CUDA kernel? I know there is maxThreadsPerBlock, but it seems for the device not for a specific kernel.

If I understand your question correctly, the new occupancy calculator API introduced with CUDA 6.5 should be helpful:

This is nice. Then technically, if I call this API to get the max active block number per SM, and set the grid size that is as same as the number of block that can be hosted by all SMs, then all blocks can be guaranteed to scheduled to a SM at the beginning instead of waiting for another block to finish. Is it correct?