Unexpected dramatic performance drop when doubling drawcalls


We’re making a vulkan backend to our engine (very early in development), and everything was going quite well, but recently we noticed very weird performance behaviour on both windows/linux with a GTX 780 (haven’t tested with much else).

Basically we’ve got about 2500 drawcalls put in one primary CB (rebuilt each frame), running at about 3.2ms, using a single thread for the rendering, two frames in flight (we’ve got separate CBs and UBOs).
Normally we’re still cpu bound, meaning I don’t have to wait on the fence from the previous frame.

If I just double each vkCmdDrawIndexed & vkCmdDraw (for regular objects) resulting in about 5000 drawcalls, the performance drastically goes down to about 12ms, with almost all of the time going into the fence wait on the previous frame.
And every once in a while I’m getting horrible peaks of about 22ms on QueuePresentKHR (about once per second).

Has anyone seen similar behaviour and/or got any good ideas of what could be the problem?

I can of course provide much more details about what we’re doing, and of course I’ve already tried a bunch of things that came to mind, and I’m running out of ideas.

Nevermind just a momentary lapse of reason:|

We’ve got a move-thread and a render-thread, which work in parallel and synchronize at the end.

The move-thread tends to be slower so the render-thread has to wait for it.
This was of course effectively giving the gpu a lot of extra time to finish the frame before the render-thread would get to wait on the fence.
The result was that as soon as the gpu-render-time got a bit bigger than the move-thread-time (about 10ms), suddenly the render thread would have to wait for the gpu.

And that whole hidden time suddenly showed up in the fence.wait() face palm