When prime is enabled, there is currently no synchronization between the source device producing the pixels and the sink device reading them. I.e., in a typical NVIDIA + Intel configuration, the Intel chip just scans out the shared buffer constantly, without regard to when the pixels are copied into it.
I have an idea. If all rendering is done on the Nvidia GPU and all the display frame display is done on the Intel GPU, is it possible to force the intel GPU to wait long enough and store the buffer, then synchronize the frames before displaying it on the screen via intel’s TearFree solution in conjunction with nvidia-prime? or DMA-BUF cross-buffer synchronization is already worked on for proper Nvidia Optimus GPU switching using the official Nvidia drivers?
From intel kernel interface manual
Option “TearFree” “boolean”
Disable or enable TearFree updates. This option forces X to perform all rendering to a backbuffer prior to updating the actual display. It requires an extra memory allocation the same size as a framebuffer, the occasional extra copy, and requires Damage tracking. Thus enabling TearFree requires more memory and is slower (reduced throughput) and introduces a small amount of output latency, but it should not impact input latency. However, the update to the screen is then performed synchronously with the vertical refresh of the display so that the entire update is completed before the display starts its refresh. That is only one frame is ever visible, preventing an unsightly tear between two visible and differing frames. Note that this replicates what the compositing manager should be doing, however TearFree will redirect the compositor updates (and those of fullscreen games) directly on to the scanout thus incurring no additional overhead in the composited case. Also note that not all compositing managers prevent tearing, and if the outputs are rotated, there will still be tearing without TearFree enabled.
Since X11 has no concept of frames as far as I’m aware, it’d break the buffers not because the GPU output isn’t synced to the monitor, but because the Intel GPU displays the buffer while the nvidia GPU is still rendering into it, if I read aplattner’s reply correctly.
The hopes are that widespread support for wayland will eventually fix this by making X11 obsolete for the home user, as wayland is frame-perfect.
fratti, while it’s true that X doesn’t traditionally have frames (though they were sort of added with the new Present extension), the issue here is a separate lack of synchronization between PRIME devices. Recent kernels added some locking / fencing support that is on my TODO list to look into. From a quick skim, though, it doesn’t look like the Intel kernel driver implements it either so it’ll probably require some more work.
Thanks for putting this on your TODO list. It’s probably quite a long list considering the recently unveiled driver changes to make EGL/Wayland work, but I’m glad there’s hope for the future.
Since most laptops with dedicated GPUs also come with an Intel GPU these days, it’s hard to avoid Optimus.
well take a look at how nouveau/bumblebee are handling it? they seem to handle it fine, and as far as i’m aware primusrun(bumblebee) works kinda the same way, it also passes frames onto the intel GPU
i’ll be moving back to bumblebee, i personally can’t live with screen tearing, but i’d be happy to hear once this gets fixed :)
Bumblebee runs 2 X11 servers and has quite a bit of overhead in compressing and copying buffers around. For example, my laptop cannot reach more than ~40 FPS no matter the application with bumblebee, while PRIME happily gives me 6500 FPS in glxgears.
As aplattner said, it’s not really an issue of “We don’t know a solution”, it’s that it needs someone to work on it. I don’t know how many people nvidia has employed to work on Linux device drivers and generally UNIX desktop/laptop support, but from the presentation concerning Wayland and EGL they’ve made at XDC it seems like they’re quite busy reworking big parts of the driver at the moment.
EDIT: Dug up some more stuff. Seems like this is somewhat related? Might be of interest for anyone working at nvidia. (Though last I checked nvidia didn’t have kms support yet or something)
If the outputs (= monitors) are directly connected to the nvidia GPU, this is not an issue, as the Intel GPU is then not involved at all from what I know.
However, on many laptop devices, the outputs are connected to the Intel GPU, so for those it’s relevant.
Weird the performance difference for me is Much smaller
are you using optirun or primusrun for glxgears? also you need to force vsync off with this: vblank_mode=0 primusrun glxgears
For anyone interested in a workaround for the QtQuick animation bugs, do the following:
Create a script file somewhere with the following contents:
export QSG_RENDER_LOOP=basic
Go to System Settings->Startup and Shutdown->Autostart, add script to start pre-KDE startup
Restart your session
Until nvidia fixes vsync or until Qt adds automatic detection for optimus, this seems to be the only way you can avoid 100% CPU usage while copying files/connecting to networks/…
You’ll still get tearing, of course. This just tells QtQuick not to rely on VSync actually working.
Scratch that, don’t use the above described workaround because it breaks Plasma 5.
So we’re back to waiting for nvidia and intel to implement PRIME buffer fencing.