why blockIdx.z is always 0?

Complied with sm_61, my kernel function was called by: en_kernel<<<numBlocks, 1>>>(x, y, …)
where numBlocks.x=91;
but inside the kernel function, I found that blockIdx.z is always 0. Why?

If you used printf to print the blockIdx.z, then maybe the buffer alotted to printf output ran full and all print statements where blockIdx.z indicated >0 values might have been truncated.

here is deviceQuery output for a Pascal device (1080Ti) and it indicates the maximum grid dimensions on x y and z axes.

Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)

So yes, this feature is expected to work on any modern hardware and CUDA revisions.

When I print blockIdx.x inside of my code, it works great and is not always zero.

Sounds like your code is broken.

You are right. The reason was that I had a very large number of blocks and printf was slow to execute so the screen print buffer was jammed and truncated.