Maximum block per grid

No of thread blocks
queried the device properties. Is the maximum number of blocks per grid in the x, y and z:
Max grid size, dim(0): 2147483647
Max grid size, dim(1): 65535
Max grid size, dim(2): 65535
?

Does this mean in dim[0], could have a maximum of 2147483647 blocks and dim[1] can have 100 and dim[2] can have 100 with 1024 threads per block

Does this mean i have 2147483647×100×100×1024 threads🤨

yes. but those are logical threads. Only up to 2048 * number_of_sm threads will run simultaneously.

1 Like

Ok👍 ,but still preety cool😁

image
i guess it’s depends on your device properties and the occupied register of your algo;
3070ti for laptop

Yes, you can launch an enormous grid. If you launch a grid with max blocks in the X direction with each block having 1024 threads even a null block will likely take minutes. If you apply max Y and Z the GPU will physically stop working before the grid finishes.

MAX_THREADS_PER_BLOCK has been 1024 since CC 3.0.
MAX_THREADS_PER_MULTIPROCESSOR changes.

  • CC3.0 - CC 9.0 2048 except
  • CC7.5 1024 Turing TU1xx
  • CC8.6/8.7 1536 Ampere GA10x (not GA100)

In an application it is best to query the device attributes. The CUDA C++ Programming Guide Compute Capability section has a table showing the values for all supported GPUs (Maximum number of resident threads per SM).