Hi,
nvcc without register limit
nvcc -I/opt/cuda/include/ --ptxas-options=-v -c tf_72bit.cu -o tf_72bit.o
ptxas info : Compiling entry function '_Z8mfakt_71j5int72Pji6int144S0_' for 'sm_10'
ptxas info : Used 16 registers, 64+16 bytes smem, 44 bytes cmem[1]
nvcc -I/opt/cuda/include/ --ptxas-options=-v -c tf_96bit.cu -o tf_96bit.o
ptxas info : Compiling entry function '_Z8mfakt_95j5int96Pji6int192S0_' for 'sm_10'
ptxas info : Used 17 registers, 64+16 bytes smem, 28 bytes cmem[1]
nvcc -I/opt/cuda/include/ --ptxas-options=-v -c tf_96bit.cu -o tf_96_75bit.o -DSHORTCUT_75BIT
ptxas info : Compiling entry function '_Z11mfakt_95_75j5int96Pji6int192S0_' for 'sm_10'
ptxas info : Used 16 registers, 64+16 bytes smem, 28 bytes cmem[1]
nvcc with register limit
nvcc -I/opt/cuda/include/ --ptxas-options=-v --maxrregcount=16 -c tf_72bit.cu -o tf_72bit.o
ptxas info : Compiling entry function '_Z8mfakt_71j5int72Pji6int144S0_' for 'sm_10'
ptxas info : Used 16 registers, 64+16 bytes smem, 44 bytes cmem[1]
nvcc -I/opt/cuda/include/ --ptxas-options=-v --maxrregcount=16 -c tf_96bit.cu -o tf_96bit.o
ptxas info : Compiling entry function '_Z8mfakt_95j5int96Pji6int192S0_' for 'sm_10'
ptxas info : Used 15 registers, 8+0 bytes lmem, 64+16 bytes smem, 28 bytes cmem[1]
nvcc -I/opt/cuda/include/ --ptxas-options=-v --maxrregcount=16 -c tf_96bit.cu -o tf_96_75bit.o -DSHORTCUT_75BIT
ptxas info : Compiling entry function '_Z11mfakt_95_75j5int96Pji6int192S0_' for 'sm_10'
ptxas info : Used 15 registers, 4+0 bytes lmem, 64+16 bytes smem, 28 bytes cmem[1]
In the second case: why are registers down to 15 per thread for the 2nd and 3rd kernel? Is this a usual behavior?
Second question:
AFAIK I can compile a kernel for multiple architectures at once (e.g. sm_11 and sm_20). Is it possible to have different register limits for the different code paths? For my code the limit to 16 registers is beneficial on GPUs with compute capability 1.1 but on the other hand on “Fermi” (compute capability 2.0) I get better performance with a higher limit (e.g. 24) or no limit on register usage.
Oliver