Without having the full trace, it does look like you are indeed spending the vast majority of time presenting. ~4ms for a swap buffers is certainly more expensive than I would expect. Based on the collapsed CPU row above, it doesn’t look like the CPU is active during those times. It only spikes activity to submit draws for the next frame by the looks of it.
I would imagine your GPU queue is fairly empty as well? This looks somewhat like a synchronization issue to me. But again - i’m guessing at this point. I would probably start by looking at your spin lock and investigating that clear.
Also, i’m going to move this discussion over the the systems thread. :)
Is that the line called labeled “OpenGL GPU work IDC”?
Because in Timeline View I only see two top-level nodes:
CPU
Threads
and both have 4 entries.
The GPU Work line shows very similar blobs to the API line, and is often also labeled with glXSwapBuffers.
So, I removed my glReadPixels and off-screen renders to make the OpenGL usage as simple as possible, to see what a normal, non sync-affected run looks like.
Would I expect to see a glXSwapBuffers to show up prominently, as I assume that during that swap, the flushing and syncing normally happens, and thus will take the bulk of CPU cycles?
What I am trying to achieve is: not have the glMapBufferRange() block the CPU.
I use a PBO to glReadPixels(), and thanks to the PBO, the glReadPixels is not blocking, but when I later want to get the actual values, the glMapBufferRange will block.