Hi,
I encountered a weird problem when using the same kernel function in two different program. On the same machine and with the same make file, the compilation results are different. For one program:
ptxas info : Compiling entry function '_Z11forwardProjPfiiff' for 'sm_20'
ptxas info : Function properties for _Z11forwardProjPfiiff
48 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9make_int4iiii
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z3maxff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _ZSt4sqrtf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z14forwardProjRayffiPiPffffffffiiff
112 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads
ptxas info : Function properties for _ZSt3absf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z7ind3to1iiiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z3minff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 56 bytes cmem[0], 36 bytes cmem[2]
and in the other one:
ptxas info : Compiling entry function '_Z11forwardProjPfiiff' for 'sm_20'
ptxas info : Function properties for _Z11forwardProjPfiiff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 23 registers, 56 bytes cmem[0], 1500 bytes cmem[2], 8 bytes cmem[16]
in which forwardProjRay is a device function called in the kernel (it even doesn’t appear in the second case). As expected, the kernel in the second case has higher occupancy and runs much faster than the first one, but they do give the same result. Is there any hint about how can that happen? Thanks