Program performs far better with NSight

Hello, I have an compute shader based raytracer that I made in OpenGL. On my computer, it averages around 30fps, which is far worse performance than I’d expect for my GPU. However, if I run my program through the Nsight frame profiler, I see over 200fps consistently, along with a DECREASE in GPU utilization. Does anyone know why this is happening or how I can achieve this framerate without Nsight? My guess is that Nsight enables some sort of optimization but I’m not sure.

Hi Daniel,

Hard to tell with out actually seeing your app, but one question would be if you have vsync enabled. That would force you to 30-60fps even if you were running faster.

When you say “a DECREASE in GPU utilization”, how are you measuring that?

Jeff@NVIDIA

Hello Jeff,

I have ensured that vsync is not enabled, so that is not the cause of the issue.

I measured GPU utilization simply using the GeForce Experience performance overlay. I’m not sure how accurate this value is but I thought it might be helpful to include it. When running without Nsight, the GPU utilization is at a constant 95-100% whereas with Nsight, it is around 85%. Additionally, the GPU clock is significantly reduced when running with Nsight, yet despite all of this the program runs much faster.

Daniel

Hi Daniel,

I was hoping we would have an easy solution to this one with vsync :)

I presume you are measuring the frame rate and GPU utilization when the application is running in the tool, not after you have already captured a frame, correct? I could understand the GPU utilization going down based on additional CPU overhead the tool introduces, slowing GPU submissions some and therefore reducing the GPU workload. However, the frame rate sky rocketing and the GPU clocks going down makes it really puzzling.

Any chance we can get a repro of your app doing this?

Jeff@NVIDIA

Hello Jeff,

My apologies, I should have mentioned that the performance increase from Nsight occurs only after I capture a frame. Nsight reports that the first captured frame took only around 3-5 milliseconds, and only after resuming the application does the FPS increase.

I have recorded a quick video demonstrating the issue here. If you would like, I can try to simplify my code and upload it to github so you can view it yourself. I’m not sure, however, if this issue is unique to my computer or not, as the only other computer I have tested it on has an AMD GPU, and its performance did not seem hindered in any way.

Thank you,
Daniel

Just an update to the issue:

I have found the cause of the slowdown in my code. I have a shader storage buffer which I write to on the GPU and then read from using glMapBuffer every frame on the CPU. Removing most of these write operations from my shader caused a massive performance increase, and now the program has the same framerate both with and without Nsight.

The question still remains, however, as to why Nsight improves the performance of these write operations so much. I was able to optimize my shader sufficiently, so the framerate is no longer a major issue, but if any information comes up I would love to hear it.

Hi Daniel,

Yes, reading that every frame will certainly cause a sync point, only allowing a maximum of 1 frame’s worth of processing at a time on the GPU. You could consider double/triple buffering that if you still need to make the info available (at a later time) on the CPU.

Without having a repro of the original problem and analyzing it, it would be hard to say, honestly. I know for sure we won’t double buffer things under the covers, and you stated that your “run mode” performance, and not replay, was what showed the improvement. If you can provide us an example, we would be happy to investigate.

Jeff@NVIDIA