blockDim component order matters?

Levs · January 11, 2017, 6:42am

Hi,
I’m running a realistic and a trivial kernel on GeForce GTX 960 and CUDA 7.5 on linux, and I found that
for realistic kernel, if I provide block dimensions as dim3(1,1,512), it throws cudaErrorInvalidConfiguration, but works on
block dimensions = dim3(512, 1, 1), or up to dim3(1, 1, 64). For the trivial kernel, it runs all of them, but if I save blockDim.z from the kernel runtime, it gets limited to 64.

Clearly, this is not an issue with my occupancy, since the total number of threads remains the same.

Why is there an assymetric blockDim index treatment?

Thanks!
Leo

Robert_Crovella · January 11, 2017, 2:49pm

The maximum block z-dimension is limited by the hardware to 64. This is documented:

[url]Programming Guide :: CUDA Toolkit Documentation

Levs · January 11, 2017, 7:18pm

Silly me! Thanks!