I have a NVIDIA GTX 570 compute capability 2.0 running cuda-4.0.
The deviceQuery executable in the CUDA SDK gives me information on my CUDA device and its various properties. Two of the lines in the output are
[b]Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
[/b]
Why is the 3rd dimension of the block restricted to be upto 64 threads only wheras the X and the Y dimension can vary upto 1024 threads?