What is the maximum number of blocks I can use?

So, I have been reading for a while, and I have not quite understood this yet.

My understanding as of now is this:

A single block can have a number of 1024 threads at most. However, this block can also be structered 1, 2 or 3-dimensional, but the product of each dimension’s size must not exceed 1024. So a 8x8x8 block configuration could work, because it would host 512 threads, but a 16x16x16 configuration couldn’t, because that would exceed the maximum amount of 1024 threads. Also each dimension must not exceed a size of 1024, 1024 and 64, respectively, while the product of those sizes must be equal to or less than 1024. Is this correct?

As a block can be structured multi-dimensionally, so can the grid containing the blocks. But I am not sure what the maximum number of blocks I can use is.

Here is the result of the deviceQuery example on my computer:

It says that each dimension for the grid must not exceed 2^31, 2^16 and 2^16 respectively. But what is the limit for the product of those dimension’s sizes? For the threads inside a block, it is 1024. What is the limit for the blocks inside the grid? And where can I see it in the output of the deviceQuery?

Yes.

There is no published limit. (And therefore no report in deviceQuery) As a test, you can try launching a kernel of maximal dimensions. As long as you don’t run into another resource limit, it should work. I’ve only done this with an empty kernel (to convince myself); if you tried this with a kernel that did anything “meaningful”, that kernel would possibly take “forever” to run.

Regarding “forever”: The product of those 3 numbers (2^31-1, 2^16-1, 2^16-1) is over 9 sextillion (blocks). If the GPU required 1 nanosecond to process each block, on average, the kernel would require almost 300 years to run. So exploring the envelope is not really practical, IMO.

2 Likes

Thanks for the quick response

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.