Failing to achieve multi-gpu onscreen rendering

Hello,
I’m trying to achieve multi-gpu onscreen parallel rendering using two Quadro K6000 graphics cards under linux. My test application is simple OpenGL 4.4 deferred renderer and I can successfully run two instances of this application each on it’s own X Server display using environment variable DISPLAY. In this case both processes run at full speed and both GPUs show 99% utilization.

Trouble is when I try to do the same from within one process. In this case application spawns two render threads instead of one. Each thread has it’s own window opened on different X Server display with render context. There’s no synchronization or data sharing between rendering threads. Application is running but GPU utilization is low (around 25%) and so is the framerates.

Two rendering threads look like this:

XOpenDisplay
create window
create render context and make it current
load shaders, upload textures/buffers
while (true)
{
	clear gbuffer fbo
	render to gbuffer fbo
	display result 
	glXSwapBuffers
}
done current, destroy all

For the purpose of testing there’s no data upload during render loop. There’s just binding of textures, binding of vertex buffers and glDrawArraysInstancedBaseInstance calls. Data for each draw call is sourced from shader storage buffer using gl_BaseInstanceID.

It seems that rendering is not parallelized. Am I missing something? What do I have to do in order to achieve good performance scaling from within single process?

system details:
Fedora 20 64 bit (3.14.8-200.fc20.x86_64), Xorg 11.0, Nvidia 331.79 drivers, Xinerama disabled

PS:
I’m having similar results running on Window 8.1. NSight profiler shows that each render context is assigned to correct GPU (no WGL_gpu_affinity need?) but rendering into two windows takes exactly twice long compared to rendering of only one viewport to a single window.