Long stuttering in minimal OpenGL (win32 native) application

Writing an interactive application, we had very disturbing (long) stutterings.

Trying to analyze the problem, we wrote a very minimal OpenGL application, using win32 API to setup context and vsync, and then looping on glClear() with a rotating color.
Build instructions are provided in the repository.

All the code is in this single file.

We ran this minimal example in Nsight Systems on several systems:

  • A laptop with an Intel video card, and there is no stuttering at all. This is the result we expected.
  • A desktop equipped with a 1080Ti, there is long stuttering, as shown by the very long glClear() call.
  • Another desktop equipped with a 2060, and glClear() exhibit the same blocking patterns.

Also interestingly, these blocking calls always seem to occur around 2 seconds and 28 seconds, on both Nvidia systems.

What could cause such stuttering?
How can we address the issue when running on Geforce GPUs?

We will be happy to provide any further information which could be of help.

Here is a screenshot of Nsight timeline:

Hello Adnn.

Any particular reason why you opened a new topic instead of trying to ping for again on your original post?

Regarding NVIDIA feedback I have opened an internal bug for this, but I did not receive new information on it. I am not an OpenGL expert, so I cannot say if this is connected to your code or to WGL or something else.

This does not seem to be a general problem since we did not receive any other complains about this. Which I would expect if it would be a widely observed issue.

So unless there is some community insight, I would ask for a bit more patience.

Thank you!

Thank you @MarkusHoHo for this feedback!

I finally decided to open a new ticket because, at the time of the original post, I thought I was facing a “gltf-specific” issue (as reflected by the title). As it turned out during the discussion that it was actually another problem, I thought it would be less confusing to have a new separate issue.

And thank you for opening a bug. Is there a way for the public to track the advancement of this ticket?

Our ticketing system is only visible to corporate customers or development studios I am afraid. But if there is some insight I will post it here.

Thanks

  • Single GPU?
  • Fullscreen window with Focus?
  • On the PRIMARY monitor?
  • Are you calling glFinish() after SwapBuffers() + glClear() on the window?
  • NVIDIA Control Panel → Manage 3D Settings:
    ** Low Latency Mode = ON?
    ** Monitor Technology = Fixed Refresh?
    ** Power Management Mode = Prefer Maximum Performance?
    ** Threaded Optimization = OFF?
    ** Triple Buffering = OFF?
    ** Vertical Sync = Use the 3D application setting?
  • PowerMizer disabled (or whatever they’re calling it nowadays)?

Must have YES for 1st 3 to get Flip presentation mode (bypassing as much DWM overhead/latency as possible, without needing hacks to achieve this). Also if multiple monitors, make sure the refresh rate is set the same on both.

1 Like

Also, post the source code for your “Minimal OpenGL (win32 native) Application” here on your GitHub repo:

Related to your issue here and what’s described there for your glClear() observations (see link below):

DWM fits. Related to this, what I can tell you (you can determine this with Nsight Systems) is that after queuing SwapBuffers(), the first write to the new window swap chain image (e.g. glClear(), glClearBufferfv(), etc.) is what forces the back-end driver to wait until it has obtained a swap chain image to render the next frame into. This is where DWM and the display system can introduce some back-pressure on the render pipeline (since it provides the swap chain image), slowing it up to the display rate when VSync is enabled. The glFinish() added after SwapBuffer() + glClear() just flows that backpressure to your CPU draw submission thread, for debugging/profiling purposes, so you can observe it easily. It also keeps the driver from queuing ahead (again for debugging/profiling purposes), so that the driver doesn’t queue up multiple frames of work and then block you for many frames while it and the GPU “catches up” to all that work you just queued.