Hi, I tried to compile my code in both cuda 6.5 and 7.0 and got different output. As you can see that in cuda 6.5 there is some register spilling, while there is no such spilling in cuda 7.0. I cannot compare the performance because the K80 machine cannot install cuda 7.0 (for now).
Just curious, is cuda 7.0 a must for K80?
CUDA 6.5
ptxas info : Compiling entry function ‘_Z28updateXByBlock2pRegDsmemTileiPfPKiS1_f’ for ‘sm_37’
ptxas info : Function properties for _Z28updateXByBlock2pRegDsmemTileiPfPKiS1_f
48 bytes stack frame, 80 bytes spill stores, 80 bytes spill loads
ptxas info : Used 128 registers, 12000 bytes smem, 360 bytes cmem[0], 8 bytes cmem[2], 1 textures
CUDA 7.0
ptxas info : Compiling entry function ‘_Z28updateXByBlock2pRegDsmemTileiPfPKiS1_f’ for ‘sm_37’
ptxas info : Function properties for _Z28updateXByBlock2pRegDsmemTileiPfPKiS1_f
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 128 registers, 12000 bytes smem, 360 bytes cmem[0], 8 bytes cmem[2], 1 textures
Thanks,
Wei