yes, it indeed said what you think about
The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512, and 64, “respectively”
in fact, some conditions in spec of compute capability must be considered together,
I can show you another
from spec of compute capability 1.0
-
The maximum number of active blocks per multiprocessor is 8
-
The maximum number of active warps per multiprocessor is 24
-
The maximum number of active threads per multiprocessor is 768
if you want to ask me “what is maximum number of active blocks in a multiprocessors”, then
you must combine these three conditions. however condition 2 is redudant, because from
condition 3 and 32 threads/warp, you have 768/32 = 24 warps
hence we only consider condition 1 and condition 3
example 1: suppose we choose size of threads block as 16 x 16 = 256, then
under condition 3, we have 768/256 = 3 blocks in a multiprocessor
under condition 1, 3 < 8, hence wew have 3 blocks in a multiprocessor
example 2: suppose we choose size of threads block as 4 x 4 = 16, then
under condition 3, we have 768/16 = 48 blocks in a multiprocessor
under condition 1, 48 > 8, hence wew only have 8 blocks in a multiprocessor
this means that we only have 8 * 16 = 128 active threads in a multiprocessor