I’m running the sample ‘deviceQuery’ on a Jetson TK1 (Tegra K1, compute capability 3.2) and I’m getting the following output:
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
which is clearly wrong! I compiled using the default makefile:
In particular the commands to compile and link are:
/usr/local/cuda-6.0/bin/nvcc -ccbin g++ -I../../common/inc -m32 \
-Xcompiler -mfloat-abi=hard -gencode arch=compute_32,code=sm_32 \
-o deviceQuery.o -c deviceQuery.cpp
/usr/local/cuda-6.0/bin/nvcc -ccbin g++ -m32 -Xcompiler \
-mfloat-abi=hard -Xlinker --dynamic-linker=/lib/ld-linux-armhf.so.3 \
-gencode arch=compute_32,code=sm_32 -o deviceQuery deviceQuery.o