Optimal 2D Grid Dimension?


Here is a simple question.

What is the optimal layout of the grid?

For example, 10000x1 grid is equivalent to 100*100 one in terms of performance?

What is the number of required grid block is general number like 9261?

My first idea is

dimGrid( (int)sqrt(9261)+1, (int)sqrt(9261)+1 )

and check the block index inside the kernel.

Do you think this is nice?


Grid dimensions are not imporant for performance, only the resulting amount of blocks is. The grid is just an abstraction to make your life easier, and so that you need to do less divmod operations in your kernel to determine where in your data set you should operate.

I would like to say that is true, but in my experience there are cases where using a block that is taller and narrow is faster than using a block that is short and wide. I don’t know what the cause of this is, exactly, but I’m sure it is related to the banking of device memory.

As always, experiment, because YMMV.


Indeed, for block sizes that can certainly hold, especially if the block size changes which memory locations are accessed. But the OP was asking about grid sizes. In which case it all depends on what the program does with the blockIdx.