Why same kernel function get different compilation result on the same machine

Hi,

I encountered a weird problem when using the same kernel function in two different program. On the same machine and with the same make file, the compilation results are different. For one program:

ptxas info    : Compiling entry function '_Z11forwardProjPfiiff' for 'sm_20'

ptxas info    : Function properties for _Z11forwardProjPfiiff

    48 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Function properties for _Z9make_int4iiii

    16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Function properties for _Z3maxff

    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Function properties for _ZSt4sqrtf

    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Function properties for _Z14forwardProjRayffiPiPffffffffiiff

    112 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads

ptxas info    : Function properties for _ZSt3absf

    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Function properties for _Z7ind3to1iiiiii

    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Function properties for _Z3minff

    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Used 34 registers, 56 bytes cmem[0], 36 bytes cmem[2]

and in the other one:

ptxas info    : Compiling entry function '_Z11forwardProjPfiiff' for 'sm_20'

ptxas info    : Function properties for _Z11forwardProjPfiiff

    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Used 23 registers, 56 bytes cmem[0], 1500 bytes cmem[2], 8 bytes cmem[16]

in which forwardProjRay is a device function called in the kernel (it even doesn’t appear in the second case). As expected, the kernel in the second case has higher occupancy and runs much faster than the first one, but they do give the same result. Is there any hint about how can that happen? Thanks

It appears quite a few functions got inlined in one case but not the other. Check that nvcc is not only called from the same makefile, but also with the same arguments. If that is the case, it is odd (but not totally unreasonable) that the heuristics used for inlining might give different results based on the context of the kernel. Try to force inlining in both cases using the [font=“Courier New”]forceinline[/font] keyword.

Thanks for your reply. That’s a useful trick and eliminate the call stack effectively.

And I figure out the problem in the end. It is still due to the difference in makefile. In the first one there is an option -G for cuda-gdb, and the compiler must insert some debugging instructions in it and make the function much bigger. After it is removed the two are exactly the same. Thanks anyway.