The cuda framework lets us define the grid size to be a two dimensional block, where each block contains a 3D group of threads.
But the ptx doc says this -
Multiple CTAs may execute concurrently and in parallel, or sequentially, depending on the
platform. Each CTA has a unique CTA identifier (ctaid) within a grid of CTAs. Each grid
of CTAs has a 1D, 2D , or 3D shape specified by the parameter nctaid. Each grid also has a
unique temporal grid identifier (gridid). Threads may read and use these values through
predefined, read-only special registers %tid, %ntid, %ctaid, %nctaid, and %gridid.
It says that the grid of CTAs has a 1D, 2D, or 3D shape. I checked and I am indeed able to access the third dimension, that is nctaid.z, apart from being able to access nctaid.x and nctaid.y. And the value that is held in nctaid.z is 1. Is there any way we can specify the 3rd dimension programatically using the cuda driver API or so, since afaik using the cuda API we can specify just 2 dimensions for the grid size? Or the 3rd dimension for grid size is unused now and is used for extensibility purposes in the future?