NVRM: Xid (PCI:0000:01:00): 41, CCMDs

I saw Jun 03 12:37:56 hades kernel: NVRM: Xid (PCI:0000:01:00): 41, CCMDs 00000010 0000a0b5
in the systemd journal log.
This is a new-ish machine and no faulty or bad ram.
The error happened after resuming from hibernate. I hibernated and resumed a few times but I didn’t see any further issues.

The NVIDIA gpu temperature rarely ever crosses 35 degrees Celsius and all CPU cores are usually under 28 degrees so I doubt it is a thermal/overheating issue.
nvidia-bug-report.log.gz (177 KB)

There is no clear identification for what it means under XID Errors :: GPU Deployment and Management Documentation.
Could this be the result of some application bug?

I did a change that stopped this error.
Archlinux’s mutter is compiled with --enable-egl-device.
Removing that configure option stopped the Xid error.
I am running Xorg session.
Would any NVIDIA developer have an idea if this is a Xorg driver issue?

Please provide reproduce steps for this issue in detail so can we can reproduce this issue for investigating. What desktop env are you running - kde, gnome , else? When this issue actually trigger? What version of Mutter you are using. How you are running mutter with desktop? Please provide nvidia bug report as soon as issue hit.

Hi sandipt, I emailed the nvidia-bug-report file to linux-bugs@nvidia.com last sunday.
the email Subject: xid 41 error.

If it didn’t arrive, I can email it again.

But basically the first time was on resume from hibernate. The other occasions where random or on alt-f2 → r in gnome-shell.

gnome-shell/mutter 3.25.2git. Gnome-shell uses libmutter (not running mutter executable directly).

Removing --enable-egl-device from mutter prevented the error. So it seems the xorg driver is behaving differently under Xorg when the window manager supports eglstreams.

Are you running gnome-shell along with mutter? Did you downloaded source code of mutter from git? Please share mutter code path from where you downloaded it? Also how you are launching mutter with gnome-shell? It would be good if you attach video recording for reproduction steps of issue ?

Please send or attach nvidia bug report again

Hi. Mutter is not ran separately. I am using gnome-shell (which uses /usr/lib/libmutter-0.so.0.0.0).
Mutter is a library that gnome-shell uses. I’m not running gnome-shell along with mutter.

To reproduce,
1)install gjs from GNOME / gjs · GitLab (master branch)
2) install mutter from GNOME / mutter · GitLab (master branch). Run ‘NOCONFIGURE=1 ./autogen.sh’ and then ‘./configure --prefix=/usr --enable-egl-device’.
–enable-egl-device enable usage of egldevice (nvidia’s wayland mechanism).
3) install gnome-shell from GNOME / gnome-shell · GitLab.

make sure nvidia-drm is loaded with modeset=0
cat /sys/module/nvidia_drm/parameters/modeset
N)

Next:

  1. Log onto an xorg session.
  2. ctrl+alt+f3 to tty and then ctrl+alt+f2 back to your gnome session.
  3. try hibernating and resuming.
  4. Press alt f2 on your keyboard and type ‘r’ in the ‘enter a command’ box. The press the return key. This reexecutes gnome-shell.

type dmesg in gnome-terminal and notice the Xid error.

Next:

  1. Recompile mutter without --enable-egl-device.
  2. log off and log back into gnome.
  3. The issue is gone.

To sum up, when mutter is compiled with --enable-egl-device, this issue happens on Xorg session.
When mutter is not compiled with --enable-egl-device, this issue does not happen on Xorg session.

Can you attach nvidia bug report to your existing post? Hope all the packages on your system are up-to-date with latest updates. Looks issue only reproduce with latest code components from git. Please let me know if any other packages with specific versions need to replicate this issue.

Normally we just install fresh Arclinux and try to reproduce this issue. But if you made any further changes, configuration, updates, packages that is triggering this issue, then lets share details about this also so we can easily reproduce this issue. Is this issue reproduce on freshly installed OS?

It is Arch Linux (with linux-lts kernel package) but gjs/mutter/gnome-shell from upstream master branch. Those three are the main changes you are looking for.
I’ll attach the nvidia-bug-report which I generated after seeing the xid error.

Any reason are you using nouveau.config=NvClkMode=15 in kernel command line?

Please remove it and blacklist nouveau. You can add Nouveau Driver in /etc/modprobe.d/blacklist.conf file. OR create file like /etc/modprobe.d/disable-nouveau.conf with below entries
blacklist nouveau
options nouveau modeset=0

And replace kernel parameters : vga=0 rdblacklist=nouveau nouveau.modeset=0
Reboot

Ok, I will. That was from last year when I was testing wayland which nouveau supported.

Nouveau is already blacklisted:

cat /usr/lib/modprobe.d/nvidia-lts.conf 
blacklist nouveau
blacklist nvidiafb

I’ll use “vga=0 rdblacklist=nouveau nouveau.modeset=0” kernel parameters. Thank you.

Sandipt, I think I was not very clear. This issue only happens when he compositor supports egldevice for wayland. The xorg driver is behaving differently in that case.

Can you explain more on it? If there is any change in reproduction steps please update you comment #7.

When mutter is compiled with --enable-egl-device, this issue happens.
When mutter is not compiled with --enable-egl-device, this issue does not happen.

I edited comment #7

Is there any effect by blacklisting with vga=0 rdblacklist=nouveau nouveau.modeset=0?

>>Press alt f2 on your keyboard and type ‘r’ in the ‘enter a command’ box.
What commands you are executing with this method?

Are you login via gdm ?

No. But likely because nouveau is already blacklisted since /usr/lib/modprobe.d/nvidia-lts.conf is embedded in the initramfs image. But as a precaution, I will keep “rdblacklist=nouveau nouveau.modeset=0” in kernel parameters.

This re-executes the gnome-shell desktop compositor using the same PID. It helps because Nvidia driver triggers steady increases in gnome-shell memory usage (Noted in glBufferData issue. - Linux - NVIDIA Developer Forums). Reexecuting the gnome-shell compositor process frees memory. But that’s a different topic.

I have filed Bug 200318962 to track this issue. We will try to replicate it for investigation.

Thank you very much.

One last question. Does this kind of error require a reboot or is exiting Xorg and running
modprobe -r nvidia_drm && modprobe nvidia_drm enough?
Thank you.

I don’t think reboot requested if all app and desktop is performing good.