Poor performance FBOs under Xinerama

Hello,

We’re running a five large screen setup in the control room of the CMS experiment. The machine has two GTX-970 cards and we use Xinerama to merge the five screens into a common desktop (KDE on Fedora 21). One machine drives three 2560x1440 monitors and the second one one 2560x1440 and one 3840x2160.

With Xinerama enabled, we see a really slow performance of either rendering into FBOs or of glReadPixels when we extract images from FBOs to save them as pngs – compared to the same thing running on my desktop with TwinView on a single GTX 970 it runs about 8 times slower (0.25 Hz instead of 2 Hz). I’m being vague as the real setup is 9 time zones away and my test machine here at the university got stolen, sigh.

The refresh-screen part of the event-loop:

  1. loop over views, update scene and redraw, do not swap
  2. swap buffers on all views
  3. for each view
  • render into FBO (typically smaller than screen size, pow 2 rounded (there was a bug in ATI drivers that made FBO rendering super slow otherwise))
  • copy out image data
  • run image creation in a separate thread

All GL windows reside fully on individual monitors.

Currently we use 343.36 as there were additional issues with 346.47: image captured on monitor0 of the first card gets consistently corrupted as shown in attachment [2].

I have observed another peculiarity in this setup: moving a mouse over the GL window while the event display application is running can sometimes result in a partial refresh of the GL window, as if part of the window would get clipped out.

I can prepare a tarball with the application, data, and instructions if you would be interested in giving it a spin.

Best,
Matevž

[1] NVIDIA bug report: http://uaf-2.t2.ucsd.edu/~matevz/tmp/nvidia-bug-report.log.gz
[2] Example of corrupted image: http://uaf-2.t2.ucsd.edu/~matevz/tmp/RhoPhi-corrupted.png

I prepared a tarball that should reproduce the problem:
[url]http://uaf-2.t2.ucsd.edu/~matevz/tmp/fireworks-740pre8-fc21-nv.tar.gz[/url]

There is a README.txt inside with short instructions.

Best,
Matevz

Tracking this issue under 200096167

Thanks for reporting this issue. We’ve done a bit of investigation, and unfortunately slowdowns of this magnitude are somewhat expected with Xinerama, especially when using FBOs. When using Xinerama, rendering commands must be duplicated for each screen. This means if you have 5 screens, there will likely be a 5x slowdown. The roughly 8x slowdown you report is in the ballpark of that number. Some of the overhead is on the GPU side, and some is on the CPU side, so even though you’re splitting the work across 2 GPUs, the workload may be several orders of magnitude slower. FBOs are particularly problematic because the entire FBO must be updated on all GPUs. When rendering to on-screen windows, the driver can at least clip the rendering to the parts of that window visible on the given screen, eliminating some of the GPU work needed to rasterize the images. If the window is present on only one screen, even more of the GPU overhead, and sometimes all of the CPU overhead in the driver can be eliminated.

In our testing, we found we could vastly accelerate the rendering by using a combination of TwinView and Xinerama to minimize the number of times the rendering commands must be processed. We configured two X screens, one with 3 monitors and one with 2 monitors, and ran Xinerama across those two screens. Could you verify this setup provides better performance on your side as well?

At the same time, we’re investigating whether there’s anything we can do to improve the performance of FBO and Xinerama in general, but I don’t have high hopes for good results here given the nature of the problem.