No of thread blocks
queried the device properties. Is the maximum number of blocks per grid in the x, y and z:
Max grid size, dim(0): 2147483647
Max grid size, dim(1): 65535
Max grid size, dim(2): 65535
?
Does this mean in dim[0], could have a maximum of 2147483647 blocks and dim[1] can have 100 and dim[2] can have 100 with 1024 threads per block
Does this mean i have 2147483647×100×100×1024 threads🤨
Yes, you can launch an enormous grid. If you launch a grid with max blocks in the X direction with each block having 1024 threads even a null block will likely take minutes. If you apply max Y and Z the GPU will physically stop working before the grid finishes.
MAX_THREADS_PER_BLOCK has been 1024 since CC 3.0.
MAX_THREADS_PER_MULTIPROCESSOR changes.
CC3.0 - CC 9.0 2048 except
CC7.5 1024 Turing TU1xx
CC8.6/8.7 1536 Ampere GA10x (not GA100)
In an application it is best to query the device attributes. The CUDA C++ Programming Guide Compute Capability section has a table showing the values for all supported GPUs (Maximum number of resident threads per SM).