Maximum grid size = 2147483647 ?

I’m running the sample ‘deviceQuery’ on a Jetson TK1 (Tegra K1, compute capability 3.2) and I’m getting the following output:

Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block:           1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes

which is clearly wrong! I compiled using the default makefile:

In particular the commands to compile and link are:

/usr/local/cuda-6.0/bin/nvcc -ccbin g++ -I../../common/inc  -m32  \
 -Xcompiler -mfloat-abi=hard  -gencode arch=compute_32,code=sm_32 \
 -o deviceQuery.o -c deviceQuery.cpp

/usr/local/cuda-6.0/bin/nvcc -ccbin g++   -m32  -Xcompiler \
 -mfloat-abi=hard  -Xlinker --dynamic-linker=/lib/ld-linux-armhf.so.3 \
 -gencode arch=compute_32,code=sm_32 -o deviceQuery deviceQuery.o

Looks correct to me. Compare appendix G of the CUDA Programming Guide:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities