Ubuntu 19+ and nvidia 440.64 - Quadro M1000M - monitor composition breaks every 24h

Hi,

I have been using Ubuntu with Quadro M1000M and it was never a smooth experience to be able to sync my two external monitors + notebook monitor. Before I was on Ubuntu 18 with 390 and it was somewhat ok, I had to unplug and plug the third monitor from 15 to 30 times every restart and eventually it would endup working and I could leave the computer on for around 20 days until the nvidia screws again and somehow fails to accept the third monitor and restart was required.

One day I updated the driver and it was the worst thing I could have ever done to myself, after many days of fighting with the drivers and configs I managed to rollback to 390 but it was super laggy and often the OS will start with black screen.

A restart was required, another 1h fighting with the OS and I could usually recover the old setup.

One day, everything broke. None of my previous tricks worked. So I decided to install Ubuntu 18 from scratch. Next I tried to install the lastest driver (after all, when things break you go for the latest and check if finally things are fixed), to my disappointment, Ubuntu 18 is not able to install 440, the only version that worked was 435.
With 435 it is impossible to configure the 3rd monitor, no matter what I tried. And I tried for about a week, all sorts of shit you can imagine I tried. Waste of time.

So I decided to upgrade to Ubuntu 19. I removed all nvidia drivers and upgraded. Then installed 440 and it worked (facepalm).

Now I have Ubuntu 19 with KDE and 440, the third monitor works but only for 24h (facepalm 2x). Every day, around 12:50 to 12:59 (WTF) the graphics card reset, the monitors are then mixed and the config with 3 monitors goes to trash. Its impossible to recover and it only works again if restarting. Oh, but not if you try to save the xorg config. If you do that, the computer will start with black screen. So you need to boot safely and get rid of older xorg config, otherwise it wont work.

Conclusions:

I noticed that after restarting nvidia with sudo rmmod nvidia_uvm ; sudo modprobe nvidia_uvm and doing a offload sudo /usr/share/sddm/scripts/Xsetup

if [ -e /sbin/prime-offload ]; then
    echo running NVIDIA Prime setup /sbin/prime-offload
    /sbin/prime-offload
fi

xrandr --setprovideroutputsource modesetting NVIDIA-0
xrandr --auto

and then restarting plasma: killall plasmashell && plasmashell > /dev/null 2>&1 & disown, if I am lucky I may be able to recover the previous state, but in a few hours it will happen again and the monitor composition goes to trash.

This is the output for my nvidia-smi:

Mon Mar 23 15:50:30 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M1000M       Off  | 00000000:01:00.0  On |                  N/A |
| N/A   42C    P5    N/A /  N/A |   1146MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1315      G   /usr/lib/xorg/Xorg                           700MiB |
|    0      1911      G   /usr/bin/kwin_x11                             52MiB |
|    0      1918      G   /usr/bin/krunner                               0MiB |
|    0      1947      G   nvidia-settings                               16MiB |
|    0      1986      G   /usr/bin/systemsettings5                      31MiB |
|    0      2010      G   /usr/bin/ksysguard                            36MiB |
|    0      4225      G   .../Downloads/GoLand-2019.3.3/jbr/bin/java   141MiB |
|    0     10794      G   plasmashell                                  101MiB |
|    0     17648      G   /usr/bin/ksysguard                            52MiB |
+-----------------------------------------------------------------------------+

This is uname-a:
Linux user 5.3.0-42-generic #34-Ubuntu SMP Fri Feb 28 05:49:40 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I see that the crash is usually related to a certain GPU action or a process that is trying to schedule to use it, or something like this.

The question here is: How can I restart the gpu without having to restart the computer ? how can I offload the processes that are using the GPU and restart the graphics card ? any ideas ?

Thank you.

That’s not possible, xorg doesn’t support this. Please check if setting
__GL_MaxFramesAllowed=1
in system environment works around the compositor crashing.

Thanks for the reply @generix , could you please indicate where the setting should be set to, like the file where it should be placed ? I could not get it from your answer.


Update:
Today I figured that restarting kwin_x11 with the following cmd: setsid kwin_x11 --replace > /dev/null 2>&1 & disown actually helps. After restarting kwin_x11 I was able to recover the monitor config back to normal without restart. Of course using it in addition to mod_probe nvidia, Xsetup and restarting plasmashell (restarting plasma can also be done with alternative cmd: sudo pkill -ABRT plasmashell) as described in full above;

Where system environment variables have to be set is depending on your distro, might be in /etc/environment