Hi!
A bit of background: I’m working on a project where we’re using UE4, and modifying it to render as fast a possible. At the moment, I’ve got a simple scene:
- A single white cube
- Double-buffered
- VSync disabled/Immediate Swap Mode
- Running on CentOS 7
- Using nVidia drivers, version 390.48
- Quadro P2000 card
- Using OpenGL 4
This is rendering at ~0.9ms. For comparison’s sake, I’m rendering a similar scene in OpenSceneGraph’s OSGViewer (which, I believe is using OGL 3 or perhaps older, and might not be double buffered) at ~0.4ms. Obviously UE4 is doing a lot of extra stuff in the background, but then that is the point: I’m trying to identify and strip out UE4 features we don’t currently need in our project. Looking at the stats, I’m GPU-bound (Game Thread ~= 0.5ms, Render Thread ~= 0.3ms, GPU ~= 0.9ms). I used UE4’s GPU stats measuring to track down the biggest slowdown to somewhere at the end of the frame. I got as far as I could with UE4’s GPU stats, and I’ve discovered and started using the Linux Graphics Debugger.
I should also say that, while I’ve done game programming for years, I’m not super well versed in lower-level rendering code. I’ve only been mucking around with it off-and-on for probably 6-9 months
Okay, so! Analyzing the scene in the debugger, I’ve only got a handful of things taking time on the GPU:
- Clearing the depth buffer: ~76us
- Drawing a depth “prepass”: ~4.8us
- Clearing the main frame buffer: ~128us
- Drawing the base pass: ~96us
- Drawing a post-process pass: ~211us
- glXSwapBuffers: ~685us
As you can see, the SwapBuffers call is by far the slowest GPU-side.
I’ve read that OpenGL operates in such a way that render commands are added queued up and processed “later on”, so you can’t use the time it takes to make the GL call as a measurement of how long things are taking GPU-side. However, this tool appears to be telling me explicitly how long the operation is taking CPU-side and GPU-side.
I guess my question is: is the GPU time I’m seeing in the Debugger accurate? Is the shown time wholly and entirely due to the glXSwapBuffers call, and not any previous rendering commands?
My followup question would then be (and this may be a question for another sub forum): is there anything I can do to speed up that call? Or does that time seem perfectly reasonable, and I’m just running up against the overhead of swapping buffers?
Thank you for your time and help!