grid dimensionality kernels


I just read the PTX manual, which seems to indicate that grids can truly be three-dimensional.
However, the CUDA manual indicates otherwise. Has anybody worked with 3D grids
under CUDA 1.1? Thanks.


The functions to invoke a kernel only allow 2D grids, the register in ptx has 3 components, so you can read the third value (which is always 1 now), but there is no way to invoke such a grid (at this time). I think they might allow this in the future.

The CUDA program guide indicates that you can execute grids with a maximum dimension of 65535x65535x1. How does this indicate a 3D grid?

Check Appendix A.

In the ptx_isa there is mention of the fact you can get x,y and z coordinates of you block within the grid, and also x,y and z dimension of the grid. I think that is where the confusion comes from.

Yes, that is correct. The ptx_isa does state that you can get x,y,z coordinates of the block within the grid. It should really state that the grid is 2D.

Well, not really, because it is true what is written there. ptx does allow it, there is just no (host-)function (yet?) to launch a 3D grid.

Host functions allow you to pass a value for the .z component of the grid dimension. Executions just fail with “too many resources requested for launch”. cudaGetDeviceProperties returns 65535x65535x1 in maxGridSize, which to me implies that this is a hardware limitation.

I thought that the concept of 3D blocks and 2D grids were software-based? Isn’t that true? If true, how can the fact that the grids are limited to 2D be hardware-related?


IMHO. the 2D or 3D grid is related to thread scheduler or said block scheduler in the GPU hardware. My guess is the scheduler on G80/G92 (or even GT200, since it’s a dual G92 chip) does not support 3D blocks.

I don’t understand what’s the problem. You can work even with 1D block dimension by just converting your 3D or any-dimension space into 1D,

so, for example, instead of 3x3x3 blocks you can run 9 1D blocks, where

z = blockID.x / 9
y = z / 3
x = blockID.x % 3

or something like that


Of course, there is no problem. 1D grids work. But if you wish to use that kind
of reasoning, why bother with 2D grids or 3D thread blocks? Users could have done that themselves. It is convenience. Also, I have found myself creating functions to more easily handle the grid in the 3rd dimension, but that wastes registers when I put this function in a loop. If Cuda handled this, I believe less registers would be used. I try to avoid loops when possible.


I agree, in terms of convenience, I have also thought about that. It would be easy for NVIDIA to implement multi-dimensional grids, not just 2D or 3D.

I think having some patience one can write macros that would wrap 2D grids and make transparent multi-dimensional grid calls and would also compute multi-dimensional indices for kernels

It would be cool if at some point, 2D or 3D grids would have some hardware benefits, like local connections between neighbour computing nodes as in 2D and 3D computing processor grids with local connections, but as I understand this is not even in far time perspective.