I think there may be a documentation typo in the CUDA Programming Guide v13.3.
Page:
Section:
2.3.2. Thread Hierarchy
In the Python list of CUDA built-in variables, the descriptions of cuda.threadIdx.[xyz] and cuda.gridDim.[xyz] appear to be swapped.
The current text says:
cuda.threadIdx.[xyz]: Size of the grid in thex,yandzdimension respectively.cuda.gridDim.[xyz]: Index of the thread in thex,yandzdimension respectively.
I believe this should be:
cuda.threadIdx.[xyz]: Index of the thread within its thread block in thex,y, andzdimensions respectively.cuda.gridDim.[xyz]: Size of the grid in thex,y, andzdimensions respectively.
This interpretation is consistent with the preceding paragraph in the same section, which says that the grid size can be queried with gridDim, the block size with blockDim, the block index with blockIdx, and the thread index with threadIdx.
It is also consistent with the C++ list immediately above the Python list:
gridDim.[x|y|z]: Size of the gridthreadIdx.[x|y|z]: Index of the thread
Could someone from NVIDIA confirm whether this is a typo in the Python list?