ptx isa gives different dimension size for a grid size

The cuda framework lets us define the grid size to be a two dimensional block, where each block contains a 3D group of threads.

But the ptx doc says this -

Multiple CTAs may execute concurrently and in parallel, or sequentially, depending on the

platform. Each CTA has a unique CTA identifier (ctaid) within a grid of CTAs. Each grid

of CTAs has a 1D, 2D , or 3D shape specified by the parameter nctaid. Each grid also has a

unique temporal grid identifier (gridid). Threads may read and use these values through

predefined, read-only special registers %tid, %ntid, %ctaid, %nctaid, and %gridid.

It says that the grid of CTAs has a 1D, 2D, or 3D shape. I checked and I am indeed able to access the third dimension, that is nctaid.z, apart from being able to access nctaid.x and nctaid.y. And the value that is held in nctaid.z is 1. Is there any way we can specify the 3rd dimension programatically using the cuda driver API or so, since afaik using the cuda API we can specify just 2 dimensions for the grid size? Or the 3rd dimension for grid size is unused now and is used for extensibility purposes in the future?

In the past it was said that current generation hardware does not allow a 3D grid, but future hardware might. I did not hear this changed with Fermi, so I guess it is still 2D

Right. Thanks Riedjik