So, I have been reading for a while, and I have not quite understood this yet.
My understanding as of now is this:
A single block can have a number of 1024 threads at most. However, this block can also be structered 1, 2 or 3-dimensional, but the product of each dimension’s size must not exceed 1024. So a 8x8x8 block configuration could work, because it would host 512 threads, but a 16x16x16 configuration couldn’t, because that would exceed the maximum amount of 1024 threads. Also each dimension must not exceed a size of 1024, 1024 and 64, respectively, while the product of those sizes must be equal to or less than 1024. Is this correct?
As a block can be structured multi-dimensionally, so can the grid containing the blocks. But I am not sure what the maximum number of blocks I can use is.
Here is the result of the deviceQuery example on my computer:
It says that each dimension for the grid must not exceed 2^31, 2^16 and 2^16 respectively. But what is the limit for the product of those dimension’s sizes? For the threads inside a block, it is 1024. What is the limit for the blocks inside the grid? And where can I see it in the output of the deviceQuery?