How do I launch a 3-dimensional grid of thread block?
According to the CUDA documentation: The tread blocks can be 1D, 2D or 3D and the grid of tread blocks can also be 1D, 2D or 3D.
I just do not understand how to launch a 3D grid, as the cuLaunchGrid function only takes 2 size parameters (width and height).
As I understand it: When I call cuFuncSetBlockShape(kernel, x, y, z) Then these are the x ,y and z that i can read via the PTX-registers %ntid.x, %ntid.y and %ntid.z.
So what is the equivalent for cuLaunchGrid and the %nctaid register (which also has 3 parts)?
Fermi hardware does support 3D grids (which is why they are in the PTX spec) but they’re not exposed in the current CUDA API. This should be in the next release.
I for one will be glad - I hate writing all that indexing code!