glXSwapBuffers GPU time accurate?


A bit of background: I’m working on a project where we’re using UE4, and modifying it to render as fast a possible. At the moment, I’ve got a simple scene:

  • A single white cube
  • Double-buffered
  • VSync disabled/Immediate Swap Mode
  • Running on CentOS 7
  • Using nVidia drivers, version 390.48
  • Quadro P2000 card
  • Using OpenGL 4

This is rendering at ~0.9ms. For comparison’s sake, I’m rendering a similar scene in OpenSceneGraph’s OSGViewer (which, I believe is using OGL 3 or perhaps older, and might not be double buffered) at ~0.4ms. Obviously UE4 is doing a lot of extra stuff in the background, but then that is the point: I’m trying to identify and strip out UE4 features we don’t currently need in our project. Looking at the stats, I’m GPU-bound (Game Thread ~= 0.5ms, Render Thread ~= 0.3ms, GPU ~= 0.9ms). I used UE4’s GPU stats measuring to track down the biggest slowdown to somewhere at the end of the frame. I got as far as I could with UE4’s GPU stats, and I’ve discovered and started using the Linux Graphics Debugger.

I should also say that, while I’ve done game programming for years, I’m not super well versed in lower-level rendering code. I’ve only been mucking around with it off-and-on for probably 6-9 months

Okay, so! Analyzing the scene in the debugger, I’ve only got a handful of things taking time on the GPU:

  • Clearing the depth buffer: ~76us
  • Drawing a depth “prepass”: ~4.8us
  • Clearing the main frame buffer: ~128us
  • Drawing the base pass: ~96us
  • Drawing a post-process pass: ~211us
  • glXSwapBuffers: ~685us

As you can see, the SwapBuffers call is by far the slowest GPU-side.

I’ve read that OpenGL operates in such a way that render commands are added queued up and processed “later on”, so you can’t use the time it takes to make the GL call as a measurement of how long things are taking GPU-side. However, this tool appears to be telling me explicitly how long the operation is taking CPU-side and GPU-side.

I guess my question is: is the GPU time I’m seeing in the Debugger accurate? Is the shown time wholly and entirely due to the glXSwapBuffers call, and not any previous rendering commands?

My followup question would then be (and this may be a question for another sub forum): is there anything I can do to speed up that call? Or does that time seem perfectly reasonable, and I’m just running up against the overhead of swapping buffers?

Thank you for your time and help!

Interestingly, I just tried out RenderDoc, and it’s claiming the following numbers:

  • Clear depth buffer: nan us
  • Draw depth prepass: (Split into two calls: ~21.76 us and ~18.24us)
  • Clear the main framebuffer: ~120us
  • Draw the base pass: ~72us
  • Draw a post-process pass: ~220us
  • glBlitFramebuffer: ~156us
  • SwapBuffers: ~1.2us

So… maybe the blitting is what’s actually taking the time? The Graphics Debugger didn’t show any GPU time for the blit which, in retrospect, seems suspicious.

To follow up, adding an extra glBlitFramebuffer call for testing increases the time attributed to glXSwapBuffers by ~60-80%, so I believe it definitely is the cause of the slowdown.

There doesn’t appear to be any way around making that call, or any way to speed it up.

May have to switch to Vulkan.

Hi thegsusfreek,

Thanks for feedback about LGD!
It’s a known issue that glBlitFrameBuffer is not timed in profiler. It will be addressed in future releases.

Excellent! Thanks for the info.