CUDA/Nsight unstable and inconsistent performance.


I have been using CUDA for a year now, but I have a couple of odd problems using Nsight and CUDA in Visual Studio:

When I make an error in my code and the kernel launch crashes, then sometimes, the whole CUDA/GPU/Nsight thing will work very very slow (after I fixed the bug ofcourse). Then
cudaGraphicsGLRegisterImage might take a minute or 2. And the Nsight debugger takes 4 minutes to get in the first kernel while before the odd behaviour, it would only take a couple of seconds.
I have tried to restart the PC, call a various of functions like cudaDeviceReset(), cudaProfilerStop(), … none of them seem to work.
The problem goes away (as far as I know) randomly.

I’m running CUDA 7.0 (because NSight is even more unstable with CUDA 7.5 on my Geforce 840M) with Visual Studio 2013.

Anyone that can help me with this bug would be greatly appreciated!

EDIT: Using NSight debugger, it now takes 8 minutes to get past the first cudaGraphicsGLRegisterImage call.

Is your project set to compile for the actual compute architecture of your 840M ?

The 840M has Compute Capability 5.0. I have tried to compile it with compute_20, compute_30 ,compute_50. They all give the same slow result.

I am not sure but Geforce 840M is not included in the list of GPUs that support Nsight VSE according to