Virtualbox freezes my system when on nVidia but not when on Intel HD

Hello,
I’m trying to run VMs on Virtualbox on a Linux machine running on nVidia GM107GLM [Quadro M1200 Mobile]. As soon as I run the VM and enter with the mouse into the window, the entire host system freezes almost immediately. If I try to disable nVidia and run directly on Intel HD, everything seems to work properly.

When the freeze occurs, the only thing I could do was to send a sysrq and reboot. After the boot, I read kernel logs, and I found these logs at the end of the previous boot:

Jun 21 23:42:14 luca-5520 kernel: [   11.265441] wlp2s0: authenticate with b0:39:56:55:9e:ac
Jun 21 23:42:14 luca-5520 kernel: [   11.269104] wlp2s0: send auth to b0:39:56:55:9e:ac (try 1/3)
Jun 21 23:42:14 luca-5520 kernel: [   11.274761] wlp2s0: authenticated
Jun 21 23:42:14 luca-5520 kernel: [   11.280943] wlp2s0: associate with b0:39:56:55:9e:ac (try 1/3)
Jun 21 23:42:14 luca-5520 kernel: [   11.293085] wlp2s0: RX AssocResp from b0:39:56:55:9e:ac (capab=0x11 status=0 aid=1)
Jun 21 23:42:14 luca-5520 kernel: [   11.295013] wlp2s0: associated
Jun 21 23:42:15 luca-5520 kernel: [   12.332996] kauditd_printk_skb: 30 callbacks suppressed
Jun 21 23:42:15 luca-5520 kernel: [   12.332997] audit: type=1400 audit(1561153335.472:42): apparmor="DENIED" operation="open" profile="/usr/sbin/mysqld-akonadi///usr/sbin/mysqld" name="/sys/devices/system/node/" pid=1844 comm="mysqld" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Jun 21 23:42:15 luca-5520 kernel: [   12.334183] Bluetooth: RFCOMM TTY layer initialized
Jun 21 23:42:15 luca-5520 kernel: [   12.334189] Bluetooth: RFCOMM socket layer initialized
Jun 21 23:42:15 luca-5520 kernel: [   12.334192] Bluetooth: RFCOMM ver 1.11
Jun 21 23:42:15 luca-5520 kernel: [   12.348341] audit: type=1400 audit(1561153335.484:43): apparmor="DENIED" operation="open" profile="/usr/sbin/mysqld-akonadi///usr/sbin/mysqld" name="/etc/mysql/my.cnf.fallback" pid=1844 comm="mysqld" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Jun 21 23:42:15 luca-5520 kernel: [   12.361635] audit: type=1400 audit(1561153335.500:44): apparmor="DENIED" operation="open" profile="/usr/sbin/mysqld-akonadi///usr/sbin/mysqld" name="/sys/devices/system/node/" pid=1864 comm="mysqld" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Jun 21 23:42:15 luca-5520 kernel: [   12.507350] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
Jun 21 23:42:29 luca-5520 kernel: [   26.346023] SUPR0GipMap: fGetGipCpu=0xb
Jun 21 23:42:35 luca-5520 kernel: [   32.078370] vboxdrv: 0000000000000000 VMMR0.r0
Jun 21 23:42:35 luca-5520 kernel: [   32.185782] vboxdrv: 0000000000000000 VBoxDDR0.r0
Jun 21 23:43:17 luca-5520 kernel: [   74.266148] NVRM: GPU at PCI:0000:01:00: GPU-a048e6ea-3105-7b75-ca04-b034f8a1ea19
Jun 21 23:43:17 luca-5520 kernel: [   74.266155] NVRM: Xid (PCI:0000:01:00): 12, COCOD 00000008 80019700 0000b097 00001414 78282828
Jun 21 23:43:17 luca-5520 kernel: [   74.274066] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000009, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_17f00000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
Jun 21 23:43:17 luca-5520 kernel: [   74.308075] NVRM: Xid (PCI:0000:01:00): 41, CCMDs 0000000b 0000b0b5
Jun 21 23:43:17 luca-5520 kernel: [   74.342679] NVRM: Xid (PCI:0000:01:00): 31, Ch 0000000b, intr 10000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST_CPU faulted @ 0x2_80000000. Fault is of type FAULT_PDE ACCESS_TYPE_WRITE

Host system is Kubuntu 19.04. I tried a couple of different guest systems, one of which is Kubuntu 19.04. nVidia driver version installed is 418.56. On the host, I pass nvidia-drm.modeset=1 to the kernel to prevent tearing. When the freeze occurs, I can see nothing about the activity of the system, but it seems like fans accelerate, like the system was busy somewhere.
Anyone else with the same problem?
Regards.
nvidia-bug-report.log.gz (1.22 MB)

Did you enable 3D/2D acceleration on the VM? If so, does disabling it avoid the crash?

Hello, it is not enabled.

Which VirtualBox version are you using?
Did you already try to disable compositing on kwin?

I tried VirtualBox 6.0.6 from my distro repos and VirtualBox 6.0.8 from the official repo.
I just tried and it seems to work properly when compositing is not enabled.

Ok, which kwin version are you using?
Does setting the environment variable in system

__GL_MaxFramesAllowed=1

work around the issue when compositing is enabled?

I’m using kwin 5.16.1 with the patches provided here (to Qt and kwin) to prevent hangs: https://bugs.kde.org/show_bug.cgi?id=406180.

Yes, setting it fixes the problem, thanks! What’s the downside of __GL_MaxFramesAllowed?

It disables triple buffering but very few applications would use that anyway (only games) falling back to double buffering.
Points to a kwin bug that has been fixed (by setting that option for kwin) but seemingly not yet arrived in releases.

Thanks then! Everything seems to be working properly now.

Hello! It seems that the problem appear again. Same behavior. This is what I see in dmesg:

Oct 12 18:55:27 luca-5520 kernel: [ 1565.725147] SUPR0GipMap: fGetGipCpu=0xb
Oct 12 18:55:33 luca-5520 kernel: [ 1571.731360] vboxdrv: 0000000000000000 VMMR0.r0
Oct 12 18:55:33 luca-5520 kernel: [ 1571.866735] vboxdrv: 0000000000000000 VBoxDDR0.r0
Oct 12 18:55:33 luca-5520 kernel: [ 1571.940062] vboxdrv: 0000000000000000 VBoxEhciR0.r0
Oct 12 18:56:27 luca-5520 kernel: [ 1625.613404] NVRM: GPU at PCI:0000:01:00: GPU-a048e6ea-3105-7b75-ca04-b034f8a1ea19
Oct 12 18:56:27 luca-5520 kernel: [ 1625.613407] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Oct 12 18:56:27 luca-5520 kernel: [ 1625.613411] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Oct 12 18:56:27 luca-5520 kernel: [ 1625.613413] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Oct 12 18:56:27 luca-5520 kernel: [ 1625.613560] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0008, Class 0000902d, Offset 000008dc, Data 00000000
Oct 12 18:56:27 luca-5520 kernel: [ 1625.613785] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000008 intr 02000000
Oct 12 18:56:27 luca-5520 kernel: [ 1625.674799] NVRM: Xid (PCI:0000:01:00): 31, Ch 0000000b, intr 10000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST_CPU faulted @ 0x2_80000000. Fault is of type FAULT_PDE ACCESS_TYPE_WRITE
Oct 12 18:56:27 luca-5520 kernel: [ 1625.874440] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Oct 12 18:56:27 luca-5520 kernel: [ 1625.874443] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Oct 12 18:56:27 luca-5520 kernel: [ 1625.874445] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Oct 12 18:56:27 luca-5520 kernel: [ 1625.874592] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 000b, Class 0000902d, Offset 0000085c, Data 00000000
Oct 12 18:56:27 luca-5520 kernel: [ 1625.874779] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 0000000b intr 02000000
Oct 12 18:56:27 luca-5520 kernel: [ 1625.889652] NVRM: Xid (PCI:0000:01:00): 41, CCMDs 0000000b 0000b0b5
Oct 12 18:56:27 luca-5520 kernel: [ 1625.902743] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 0000000b intr 00800000
Oct 12 18:56:27 luca-5520 kernel: [ 1625.902873] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 0000000b intr 00800000
Oct 12 18:56:38 luca-5520 kernel: [ 1636.223237] Asynchronous wait on fence NVIDIA:nvidia.prime:5fa2 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Oct 12 18:56:38 luca-5520 kernel: [ 1637.040649] sysrq: SysRq : Emergency Sync

__GL_MaxFramesAllowed=1 is not sufficient anymore. The nvidia driver has not changed, I still see version 418.56. Any other workaround? Thanks!

Since you’re also getting XID 32 now, please check if your system memory is faulty, run memtest86 for a while.

I’m having some trouble running memtest86 on this machine. I just finished running the Dell diagnostic utility which tested memory for about 15 minutes. No issue was found. I’ll try to find a different USB key to boot memtest86, but this machine seems in good shape.
I just noticed that the problem seems to only appear when a scaling factor different from 100% is applied to the vietual machine (through virtualbox menu). If 100% is set, everything seems to work. If I set something different, the system freezes as soon as the pointer enters the virtual machine. On Intel HD, again, I cannot reproduce the problem.

memtest86 should be accessible from grub menu (hold down shift on boot).
Did this work previously (with the env variable set) and a scaling factor >100%?

Tried already from grub, but it is not there. (and, btw, the shift key does not show grub… weird…, I had to change grub timeout).
Unfortunately it is difficult to say what the value of the scaling was the first time, and I don’t think I can find it out in any way now. What I know for certain is that on Intel, scaling != 100% works now, while on nVidia it freezes the system immediately. Would be interesting to know if this is only related to KDE or if even Gnome shows the same behavior.