I have encountered a very strange problem. Unfortunately a rather worrying one if our Cuda code is ever to make it into “production code”.
I have two almost identical machines. The only difference is that one has a single GPU (a Geforce Titan) and the other two GPUs - one of them being a Titan Black.
On the single GPU machine I can launch our Cuda based app and it gets accelerated very well compared to our fallback host code. On the other machine things are not as well unfortunately:
When I launch the application by running the .exe (from explorer/command prompt/debugger) performance is about 8 times slower than expected. However, if I launch the application using a profiler (NSight/Visual Studio or nvvp) I get good performance (i.e. it runs approx. 8 times faster than when I am not profiling). This goes for both our own kernels as well as e.g. cuFFT calls.
This behaviour is not related to two GPUs being present. I am using the Titan, and the same thing happens if I take out the second card.
If anybody has any clues as to how this could be, please let me know. Any help would be appreciated. What could the profilers be doing that I am not?
I am using Cuda 6.0, the runtime API, Windows 8.1 64bit. Today reinstalled everything “Nvidia related” on the machine without resolving the problem.