Program performs far better with NSight

d_elwell · April 20, 2022, 3:55pm

Hello, I have an compute shader based raytracer that I made in OpenGL. On my computer, it averages around 30fps, which is far worse performance than I’d expect for my GPU. However, if I run my program through the Nsight frame profiler, I see over 200fps consistently, along with a DECREASE in GPU utilization. Does anyone know why this is happening or how I can achieve this framerate without Nsight? My guess is that Nsight enables some sort of optimization but I’m not sure.

jkiel_NV · April 20, 2022, 7:30pm

Hi Daniel,

Hard to tell with out actually seeing your app, but one question would be if you have vsync enabled. That would force you to 30-60fps even if you were running faster.

When you say “a DECREASE in GPU utilization”, how are you measuring that?

Jeff@NVIDIA

d_elwell · April 20, 2022, 9:14pm

Hello Jeff,

I have ensured that vsync is not enabled, so that is not the cause of the issue.

I measured GPU utilization simply using the GeForce Experience performance overlay. I’m not sure how accurate this value is but I thought it might be helpful to include it. When running without Nsight, the GPU utilization is at a constant 95-100% whereas with Nsight, it is around 85%. Additionally, the GPU clock is significantly reduced when running with Nsight, yet despite all of this the program runs much faster.

Daniel

jkiel_NV · April 21, 2022, 1:08pm

Hi Daniel,

I was hoping we would have an easy solution to this one with vsync :)

I presume you are measuring the frame rate and GPU utilization when the application is running in the tool, not after you have already captured a frame, correct? I could understand the GPU utilization going down based on additional CPU overhead the tool introduces, slowing GPU submissions some and therefore reducing the GPU workload. However, the frame rate sky rocketing and the GPU clocks going down makes it really puzzling.

Any chance we can get a repro of your app doing this?

Jeff@NVIDIA

d_elwell · April 21, 2022, 4:37pm

Hello Jeff,

My apologies, I should have mentioned that the performance increase from Nsight occurs only after I capture a frame. Nsight reports that the first captured frame took only around 3-5 milliseconds, and only after resuming the application does the FPS increase.

I have recorded a quick video demonstrating the issue here. If you would like, I can try to simplify my code and upload it to github so you can view it yourself. I’m not sure, however, if this issue is unique to my computer or not, as the only other computer I have tested it on has an AMD GPU, and its performance did not seem hindered in any way.

Thank you,
Daniel

d_elwell · April 22, 2022, 2:55pm

Just an update to the issue:

I have found the cause of the slowdown in my code. I have a shader storage buffer which I write to on the GPU and then read from using glMapBuffer every frame on the CPU. Removing most of these write operations from my shader caused a massive performance increase, and now the program has the same framerate both with and without Nsight.

The question still remains, however, as to why Nsight improves the performance of these write operations so much. I was able to optimize my shader sufficiently, so the framerate is no longer a major issue, but if any information comes up I would love to hear it.

jkiel_NV · April 27, 2022, 10:16pm

Hi Daniel,

Yes, reading that every frame will certainly cause a sync point, only allowing a maximum of 1 frame’s worth of processing at a time on the GPU. You could consider double/triple buffering that if you still need to make the info available (at a later time) on the CPU.

Without having a repro of the original problem and analyzing it, it would be hard to say, honestly. I know for sure we won’t double buffer things under the covers, and you stated that your “run mode” performance, and not replay, was what showed the improvement. If you can provide us an example, we would be happy to investigate.

Jeff@NVIDIA

cameronreikes · July 11, 2023, 5:41am

I am unfortunately going to have to necro this as my opengl video game currently has the same thing happening, but much more drastic! I’m getting 40-50ms per frame outside of nsight, then when I run it from nsight I’m getting a smooth 8ms/frame! This is on a 5-10 year old laptopgtx 1050 ti.

Here is the build if anybody is curious: main.zip (19.8 MB)

cameronreikes · July 11, 2023, 6:51am

It turns out, this had nothing to do with nsight. I was just running with integrated graphics by default 😭. Nsight was forcing it to use the gpu on my laptop

dwoods · July 11, 2023, 9:28pm

Hello,
Thanks for using Nsight Graphics and thanks for sharing the update. Sounds like you have a path forward now.
Regards,