Increased program execution time


I am having this weird behavior that I cannot understand. Its related to the performance not the correctness of the program. What I did is that I added a new global function to the program. This resulted in increasing the execution time from 3.8 seconds to about 5 seconds. So I played around by removing the calls to this function and replacing it with the original code that was executing on the CPU. I still have the function defined in the program but is never used anywhere (no calls to it), the performance was still around 5 seconds. Its not until I remove the definition from the kernel that the performance is improved.

I can understand that the memory operations between the CPU and GPU could increase the execution time but I dont understand how the performance suffers by just having the definition without any use of the function in the program. I even moved the function in its own file to see if there will be any difference.

Would someone explain why did the performance suffered from just defining a global function in the kernal without being used?

Thanks in advance…

Just a wild theory, but perhaps the compiler isn’t removing the unused code, and that’s increasing register use, and that’s preventing the GPU from running more blocks simultaneously.
The register count of the output might give a clue.

But you’re right, of course, you’d expect that an unused function would never affect runtime performance.