Nsight 3.2.1 very slow during Shader-Debugging

I find Nsight 3.2.1 in Visual Studio 2012 nearly unusable to debug longer shaders as it slows my computer down so bad that it is almost impossible to work. Everything is very fine until I set a breakpoint in a shader. From now on my quad-core machine runs on 100%: 75% my app and devenv.exe around 25%. It slows down so bad, the mouse-cursor freezes regularly, switching to other applications takes seconds - this means its REALLY serious…

Ok after I’ve set the breakpoint after serveral seconds (!) it hits and then I can step through the program. When I hit F11 for next step i have to wait at least 10 seconds for the cursor to move to the next instruction, this is impossible. The same behaviour is during scrolling through the shader and inspecting values of variables by moving the mouse-cursor over it.

I’ve seen the youtube-video which introduced debugging opengl in nsight 3.2 with visual studio 2012 and i was really excited about the possibility to debug shaders because I have long sought for such a possibility but as it seems the debugging performance on my machine is far from the one shown in the youtube-video. I’ve got a quite performant machine: Core i5 4x3.4GHZ, 16GB RAM, GeForce GTX 770 4GB RAM and newest drivers so this should be not the problem.

I think NSight would be the most useful tool to develop a serious OpenGL application but with this shader-debugging performance it is nearly unusable, but I don’t have any idea for what reason.

Anyone experienced the same problem? Any help really appreciated!


sorry you are running into such a problem. Can you tell me the OpenGL version you are using and the specific driver?
Is there a way for you to share your application so we can see what’s going on?


I installed Version 3.1 but still the same issues. Now it takes even around 20 seconds after I switch with CTRL-Z and Space. I tried to switch a few options like disable secure connection, some shader-debug otpions but it didn’t change anything.

I’m using 400 core in my shaders and so in my code too - the feature of OpenGL 4.0 I’m using is subroutines.

My driver is the GeForce R331 Game Ready Driver: Version 331.82 from Nov. 19th 2013.

Well it is on GoogleCode but I doubt it is a conventient way to share what’s going on. In fact it is a deferred lighting renderer, so nothing tricky going on for now. Currently I’m just trying to get the lighting and shadowing working again.

It seems that the fewer GL-events during one frame, the faster Nsight works.
Currently i have around 530 events due to using the release-build (only one gl error-check/frame) and reduced lighting and shadowing and performance is nearly useable but if I use my debug-build and set my scene to full lighting and shadowing it generates over 2000 events and then its really unusable.

It makes sense but can this be so drastic? I always think about real-world industrial use-cases, what are they doing? I guess a game during development reaches well above 2000 events/frame or am i wrong?


What OS, OS bitness and application bitness?
When doing local GPU shader debugging, there is a lot of work that needs to be done in order to keep windows responsive while GPU debugging.
What is the path to the GoogleCode so we can try this in-house?

One option is to change the debugging preference option (Nsight -> Options… -> Graphics -> Shader Debugging -> Shader debugging preference) to “Limited Debugging Experience”. This would reduce the overhead to some degree. Currently, the “Minimal Debugging Experience” option was only done only for DX shaders and will be extended to OpenGL in the future.

If you are seeing things slow by just doing frame API debugging (no shader debugging), then that’s something more concerning. Do you see the same slow down (how bad is it with Nsight 3.2.1 and 331.82?) if you right click on the EXE and select to run through Nsight HUD? Can you repro with the GoogleCode you mentioned? if so, can you provide the path to it?


The OS is Win7 64Bit. The application build is 32 Bit.
Yes I can understand that a lot of work is needed to keep windows responsive, probably heavy utilization of the driver? It’s a little bit a miracle to me how shader-debugging can be achieved on such a highly parallel hardware. May I ask how you do it? Are there any papers around describing it?

Ok I installed now again Version 3.2.1 and now it crashes the whole machine as soon as I set a break-point in my shader.
I’ve changed a few things in my engine since my first post, maybe it has something to do with it. The most prominent change is the switch to use VAOs - I cannot build this back. The other one is that I finally request a Version 4.0 Core Profile using GLFW - I’ve built it back (ignoring the hints) but it doesn’t change anything. To be honest I feel a bit lost right now without the shader-debugging functionality because thats what I MOST care for, it is so extremely powerful that I REALLY hope to get it to work on my machine…
Frame API-Debugging seems to work OK and performant as long as I stay below 500 events - in this case I am able to switch fast through the events and I get a nice visualisation of each steps. Although above 500, around 1000 it gets slower but thats ok - I think I have to accept that the more events, the slower everything just has to be because of the extreme overhead (magic). HUD-Version and out of Visual-Studio executed seems to have no performance difference (at least no obvious).

I can give you the GoogleCode repo but the problem is that the engine is designed and implemented as a bunch of dlls with api provided through header-files. Those dlls and headers are then used by an application to build a game from it (dlls loaded during runtime).
So there are in fact two GoogleCode repos: the engine and the demo-app. I think it should be not a big problem to set-up the engine and the demo-app but to build both you need a visual studio 12 boost build installed somewhere…
Here is the engine: https://code.google.com/p/zazengine/
Here is the demo-app: https://code.google.com/p/zazensquares/
If you really plan to run it on your machines I will create a separate scene-description for you as I didn’t upload all the assets because I’m using stuff from Doom3 which is property of Id Software.

Crash was caused by Setting “Preferred remote shader debugging mode” to “Replay Based Debugging” instead of “Prefer Full HW Debugging”.