Yes. It is ok.

In general you can have:

- one dimensional array of blocks on grid where each block has one dimensional array of threads then:

UniqueBlockIndex = blockIdx.x;

UniqueThreadIndex = blockIdx.x * blockDim.x + threadIdx.x;

- one dimensional array of blocks on grid where each block has two dimensional array of threads then:

UniqueBlockIndex = blockIdx.x;

UniqueThreadIndex = blockIdx.x * blockDim.x * blockDim.y + threadIdx.y * blockDim.x + threadIdx.x;

- one dimensional array of blocks on grid where each block has three dimensional array of threads then:

UniqueBockIndex = blockIdx.x;

UniqueThreadIndex = blockIdx.x * blockDim.x * blockDim.y * blockDim.z + threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;

- two dimensional array of blocks on grid where each block has one dimensional array of threads then:

UniqueBlockIndex = blockIdx.y * gridDim.x + blockIdx.x;

UniqueThreadIndex = UniqueBlockIndex * blockDim.x + threadIdx.x;

- two dimensional array of blocks on grid where each block has two dimensional array of threads then:

UniqueBlockIndex = blockIdx.y * gridDim.x + blockIdx.x;

UniqueThreadIndex =UniqueBlockIndex * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;

- two dimensional array of blocks on grid where each block has three dimensional array of threads then:

UniqueBlockIndex = blockIdx.y * gridDim.x + blockIdx.x;

UniqueThreadIndex = UniqueBlockIndex * blockDim.z * blockDim.y * blockDim.x + threadIdx.z * blockDim.y * blockDim.z + threadIdx.y * blockDim.x + threadIdx.x;

Thre dimensional grid of blocks hasn’t been supported on CUDA (yet) so you have only those 6 combinations.

UniqueThreadIndex means unique per grid.

Offcourse you can calculate LocalThreadIndex (per block) for a three dimensional array of threads ie.

LocalThreadIndex = threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;