Hello,
I am having this weird behavior that I cannot understand. Its related to the performance not the correctness of the program. What I did is that I added a new global function to the program. This resulted in increasing the execution time from 3.8 seconds to about 5 seconds. So I played around by removing the calls to this function and replacing it with the original code that was executing on the CPU. I still have the function defined in the program but is never used anywhere (no calls to it), the performance was still around 5 seconds. Its not until I remove the definition from the kernel that the performance is improved.
I can understand that the memory operations between the CPU and GPU could increase the execution time but I dont understand how the performance suffers by just having the definition without any use of the function in the program. I even moved the function in its own file to see if there will be any difference.
Would someone explain why did the performance suffered from just defining a global function in the kernal without being used?
Thanks in advance…