currently I’m working on some image processing kernels which will be used on a jetson tk1 board. Hence I’m working with the eclipse version of nvidia nsight on ubuntu 14.04 for coding and cross compilation and use a jetson tk1 development board for executing and profiling the code.
After I finished developing the first version of these kernels (each with its own stadn-alone prototype application), I did some performance measurements and then, not expecting large differences, switched to the release build configuration and compared these results with those recorded while using the debug build configuration.
Now I’m really confused from the results I got from this comparison. On kernel was speeding up about 25% but two others slowed down about 50% when using the release configuration!
I searched for a reason for this performance loss and at the moment the only thing I could identify is the “Generate device debug information (-G)” flag of the nvcc compiler. If i enable this flag in the release configuration, the differences (good and bad) are gone.
I really cannot imagine how this -G debug flag can influence the execution time of kernels in such a strange way.
Hopefully someone can give me a hint where to search for the reason of this problem or even better a solution how to avoid this behavior.