nvcc -g -G -Xptxas=-v --maxrregcount=64 -arch=sm_20 common_libs.o timing.o dataIO.o gpu_testing.cu -o gpu_testing
ptxas info : Compiling entry function ‘_Z14UT_k_rmsd_calcjjjP6float3Pf’ for ‘sm_20’
ptxas warning : Too big maxrregcount value specified 64, will be ignored
ptxas info : Function properties for _Z14UT_k_rmsd_calcjjjP6float3Pf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _ZSt4fabsf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for Z9calc_rmsdjP6float3S0
40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads
ptxas info : Function properties for _ZSt4sqrtf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 8048+0 bytes smem, 64 bytes cmem[0]
ptxas info : Compiling entry function ‘_Z22UT_k_center_conformersjjP6float3’ for ‘sm_20’
ptxas warning : Too big maxrregcount value specified 64, will be ignored
ptxas info : Function properties for _Z22UT_k_center_conformersjjP6float3
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for ZmIR6float3RKS
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z17center_conformersjjP6float3
40 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads
ptxas info : Function properties for _ZdVR6float3RKf
16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads
ptxas info : Function properties for ZpLR6float3RKS
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 20 registers, 48 bytes cmem[0]
ptxas info : Compiling entry function ‘Z20k_update_point_rmsdsjjjPjP6float3PfS_S2_S_S2_S_jS2’ for ‘sm_20’
ptxas warning : Too big maxrregcount value specified 64, will be ignored
ptxas info : Function properties for Z20k_update_point_rmsdsjjjPjP6float3PfS_S2_S_S2_S_jS2
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z21parallel_ExcPrefixSumjPj
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _ZSt4fabsf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for Z12parallel_MaxILj512EfjEvjPT0_PT1
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for Z9calc_rmsdjP6float3S0
40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads
ptxas info : Function properties for _ZSt4sqrtf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z28parallel_binary_scatter_sortjPj
16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads
ptxas info : Used 44 registers, 20336+0 bytes smem, 128 bytes cmem[0]
ptxas info : Compiling entry function ‘Z19k_calc_c_to_c_distsjjPjP6float3PfS_S2_S_jS2’ for ‘sm_20’
ptxas warning : Too big maxrregcount value specified 64, will be ignored
ptxas info : Function properties for Z19k_calc_c_to_c_distsjjPjP6float3PfS_S2_S_jS2
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _ZSt4fabsf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for Z9calc_rmsdjP6float3S0
40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads
ptxas info : Function properties for _ZSt4sqrtf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 6000+0 bytes smem, 104 bytes cmem[0]
ptxas info : Compiling entry function ‘Z14k_parallel_maxjPfPjS_S0’ for ‘sm_20’
ptxas warning : Too big maxrregcount value specified 64, will be ignored
ptxas info : Function properties for Z14k_parallel_maxjPfPjS_S0
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for Z12parallel_MaxILj512EfjEvjPT0_PT1
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 12 registers, 4096+0 bytes smem, 72 bytes cmem[0]
ptxas info : Compiling entry function ‘_Z19k_center_conformersjjP6float3’ for ‘sm_20’
ptxas warning : Too big maxrregcount value specified 64, will be ignored
ptxas info : Function properties for _Z19k_center_conformersjjP6float3
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for ZmIR6float3RKS
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z17center_conformersjjP6float3
40 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads
ptxas info : Function properties for _ZdVR6float3RKf
16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads
ptxas info : Function properties for ZpLR6float3RKS
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 20 registers, 48 bytes cmem[0]
[b]
Basically I’ve told my compiler that I have up to 64 registers using maxrregcount available per thread (I’m using 512 threads per block), so 32768/512=64.
Yet the above ptx info shows that only a total of 44 registers have the used, with the rest spilling into local memory when I haven’t even saturated the 64 registers available. WTF[/b]