How do I launch a 3-dimensional grid of thread block?
According to the CUDA documentation: The tread blocks can be 1D, 2D or 3D and the grid of tread blocks can also be 1D, 2D or 3D.
I just do not understand how to launch a 3D grid, as the cuLaunchGrid function only takes 2 size parameters (width and height).
As I understand it: When I call cuFuncSetBlockShape(kernel, x, y, z) Then these are the x ,y and z that i can read via the PTX-registers %ntid.x, %ntid.y and %ntid.z.
So what is the equivalent for cuLaunchGrid and the %nctaid register (which also has 3 parts)?
Am I completely misunderstanding something?