Nvidia driver 304.108 multi threaded process fails, gl pipeline contention

So the questions is, does the Nvidia driver just not function in a multi threaded process space?
Even though the glx context are on separate GPUs from separate cards at separate PCI address?
To me it seems like it should work like separate X windows. Multiple instances of glxgears even on the same screen does not cause issues…

Somewhere I read that the Nvidia driver copies the gl commands to each GPU because of shared GL object space. This would cause a lot of context switching that was not required. If that is true is there some way to disable this functionality. If the threads are not using any shared objects i.e. context, pbo, textures, etc) then can we make the pipelines ignore the others?

Application has 3 pthreads, each with its own glx contex created at thread scope.
main does XInitThreads before launching all the threads thread.
Thread 1 renders a lot and displays to card 2 PCI 02 second head (localdisplay:0.0)
Thread 2 renders very little and displays to card 1 PCI 01 second head (localdisplay:0.1)
Thread 3 renders a lot but does not display, but attaches to card 1 PCI 01 primary head (localdisplay:0.2)

The executable only setups the threads, data sockets, glcontext, view ports, and ortho commands, all the rendering is done in black box libs.

I am inferring that the gl pipeline is not separate based on timers on each function
Have observed random lock ups as glxswapbuffers, glreadpixels, or somehing in the render goes off for 100 to 800 milliseconds.

Don’t run thread 3, thread 1 hits 28-30 FPS (throttled on 60Hz data packet) and the render takes 8 - 16 milliseconds

Running thread 1 and 3 at the same time FPS < 20 and timers on primary functions bounce between 10-30 milliseconds. Render on thread 1 goes between 12 - 30 milliseconds

I have tried running each thread on separate cores
A gazillion xorg.cong options (see other post)

Just read about __GL_THREADED_OPTIMIZATIONS in the 310 driver line so I am going to swap drivers and enable that env tomorrow.

I am also experimenting with multiple X servers to run thread 1 on :0.0 thread 2 on :1.1 and thread 3 on :1.0 to see if that improves the performance.