What is the maximum number of blocks I can use?

So, I have been reading for a while, and I have not quite understood this yet.

My understanding as of now is this:

A single block can have a number of 1024 threads at most. However, this block can also be structered 1, 2 or 3-dimensional, but the product of each dimension’s size must not exceed 1024. So a 8x8x8 block configuration could work, because it would host 512 threads, but a 16x16x16 configuration couldn’t, because that would exceed the maximum amount of 1024 threads. Also each dimension must not exceed a size of 1024, 1024 and 64, respectively, while the product of those sizes must be equal to or less than 1024. Is this correct?

As a block can be structured multi-dimensionally, so can the grid containing the blocks. But I am not sure what the maximum number of blocks I can use is.

Here is the result of the deviceQuery example on my computer:

It says that each dimension for the grid must not exceed 2^31, 2^16 and 2^16 respectively. But what is the limit for the product of those dimension’s sizes? For the threads inside a block, it is 1024. What is the limit for the blocks inside the grid? And where can I see it in the output of the deviceQuery?


There is no published limit. (And therefore no report in deviceQuery) As a test, you can try launching a kernel of maximal dimensions. As long as you don’t run into another resource limit, it should work. I’ve only done this with an empty kernel (to convince myself); if you tried this with a kernel that did anything “meaningful”, that kernel would possibly take “forever” to run.

Regarding “forever”: The product of those 3 numbers (2^31-1, 2^16-1, 2^16-1) is over 9 sextillion (blocks). If the GPU required 1 nanosecond to process each block, on average, the kernel would require almost 300 years to run. So exploring the envelope is not really practical, IMO.


Thanks for the quick response

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.