[BUG] NVIDIA v495.29.05 driver spamming dbus-enabled applications with invalid messages

Greetings,

With NVIDIA v495.29.05 drivers, I am seeing a big problem with weird/invalid dbus messages being sent to a dbus-enabled OpenGL software (a Second Life viewer in this case).

After a few minutes of running the 3D software, the /var/log/messages log fills up at an alarming rate (over 100 messages per second, leading to hundreds of MB large log files in a matter of just a few minutes) with messages such as:

Oct 24 14:55:32 localhost dbus-daemon[1939]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.25" (uid=0 pid=12679 comm="./bin/cool_vl_viewer-bin")

Running dbus-monitor --system, I get, when the OpenGL app is running, messages such as:

method call time=1635081664.325234 sender=:1.19 -> destination=nvidia.powerd.server serial=21792 path=/nvidia/powerd/datapacket; interface=nvidia.powerd.datapacket; member=AutoflDatapacket
   struct {
      int32 4193
      int32 10538
      int64 1461543769
      double 1.98642e+08
      double 3.44308e+22
      double 7.41929e+19
      double 5.64176e-315
      int32 0
      int32 1668
   }
error time=1635081664.325251 sender=org.freedesktop.DBus -> destination=:1.19 error_name=org.freedesktop.DBus.Error.ServiceUnknown reply_serial=21792
   string "The name nvidia.powerd.server was not provided by any .service files"

What is bizarre, is that while this SL viewer does act as a dbus server (for being sent and acting upon “SLURLs” by processing requests by other viewers or a web browser), it does so only on the session bus (not on the system bus), and only for a custom service and methods that do not involve at all upowerd.
When I disable the dbus support in the viewer, the issue does not happen. Instrumenting the code to try and detect illegal dbus methods requests on the session bus does not lead to any result (no error detected), and increasing the rate at which dbus is polled at each frame (glib-gio is used for dbus support in this viewer) does not either prevent the “full message queue” to occur.

Also, it must be noted that this computer is tuned for performances and as such does not even have upower installed: the reason why NVIDIA’s beta driver insists on trying to “speak to” powerd is therefore a mystery on its own !

Reverting to v470.74 which is the last valid/working driver for me…

nvidia-bug-report.log.gz attached.nvidia-bug-report.log.gz (303.9 KB)

5 Likes

Indeed I can confirm this behaviour. Every application using libnvidia-glcore.so or libnvidia-eglcore.so in some way sends these messages to nvidia-powerd. Since gnome-shell is hardware accelerated it does this, too. As far as my research went, nvidia-powerd was introduced with 460.27.04 but cancelled shortly after.
After some time this seems to trigger another bug in dbus’ error handler which leads to the spammed log files like already mentioned.
The performance impact can also be felt. I’ve cobbled together a few lines of Python to register the nvidia-powerd interface it’s looking for to at least get rid of error processing by dbus. Especially performance when switching between different GL-applications is greatly enhanced (like a game on one and a browser on another monitor).

nvidia-powerd.py

#!/usr/bin/env python3

import dbus
import dbus.service
import dbus.mainloop.glib
from gi.repository import GLib

class Service(dbus.service.Object):
    def __init__(self):
        dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)
        bus_name = dbus.service.BusName("nvidia.powerd.server", dbus.SystemBus())
        dbus.service.Object.__init__(self, bus_name, "/nvidia/powerd/datapacket")
        self.loop = GLib.MainLoop()

    def run(self):
        self.loop.run()

    '''
    @dbus.service.method("nvidia.powerd.datapacket", in_signature='(iixddddii)', out_signature='')
    def AutoflDatapacket(self, s):
        print(s)
    '''

if __name__ == "__main__":
    Service().run()

if you want to do some more processing of the messages just uncomment

@dbus.service.method("nvidia.powerd.datapacket", in_signature='(iixddddii)', out_signature='')
def AutoflDatapacket(self, s):
    print(s)

and do whatever you want with the data.

To be able to register this interface a policy file is also needed.

/etc/dbus-1/system.d/nvidia-powerd.conf

<!DOCTYPE busconfig PUBLIC
 "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>
        <policy user="root">
                <allow own="nvidia.powerd.server"/>
        </policy>
	<policy context="default">
		<allow send_destination="nvidia.powerd.server"/>
		<allow receive_sender="nvidia.powerd.server"/>
	</policy>
</busconfig>

The user in the line

<policy user="root">

can be changed to some normal, unprivileged user to avoid running this as root.

Best regards
Jan

3 Likes

Sadly, your trick does not work. When running the fake nvidia.powerd.server, as soon as I launch the SL client I get the log spammed with thousands of messages like this:

localhost dbus-daemon[1961]: [system] Rejected send message, 0 matched rules; type="error", sender=":1.23" (uid=0 pid=25669 comm="python3 /usr/local/bin/nvidia-powerd.py") interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.UnknownMethod" requested_reply="0" destination=":1.30" (uid=0 pid=25954 comm="./bin/cool_vl_viewer-bin")

Also, you mention that it triggers a bug in dbus itself: I am curious about this bug. Has it been fixed already (any distro-specific patch existing, if not yet fixed upstream) ?

Sorry to hear that this didn’t help in your case but I am by no means an expert in dbus ;-) My workflow for debugging is just using busctl monitor or dbus-monitor --system and in my case the errors vanish and the method calls look okay when this script is running. Maybe you can try uncommenting the method thing and see if there’s an output. If there is the trick should work. If not you can try researching how to do this correctly and maybe post an update if you like.
Also in my case the log messages you mentioned appeared after a few days of uptime (difficult to tell exactly) so it seems like some counter related to error handling within dbus got overflown. I’m on version 1.12.20 using Gentoo Linux and so far this issue (if it even is an issue) seems to be unknown.
Nevertheless I hope Nvidia will turn this off in the next release because it’s just wasting performance and energy.

I might have found a work-around: I added an /etc/dbus-1/system-local.conf file (which gets included by the /usr/share/dbus-1/system.conf default file on dbus daemon startup) with the following blacklisting rules:

<!DOCTYPE busconfig PUBLIC "-//freedesktop//DTD D-Bus Bus Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>
  <policy context="mandatory">
	<deny send_interface="nvidia.powerd.datapacket" />
	<deny receive_interface="nvidia.powerd.datapacket" />
  </policy>
</busconfig>

This way, the messages do not seem to be sent or received any more by the 3D application, and the log spamming does not seem to happen any more so far.

It is yet just a work-around, and that nasty bug needs fixing !

Scratch my last post !

The spamming still happens (it just took a little longer, apparently)… :-(

Sadly, this bug also affects the latest v495.44 version… :-(

I can also confirm I’m seeing this bug with 495.29.05.

dbus-monitor --system shows:

method call time=1635287818.294171 sender=:1.1722 -> destination=nvidia.powerd.server serial=1820925 path=/nvidia/powerd/datapacket; interface=nvidia.powerd.datapacket; member=AutoflDatapacket
   struct {
      int32 4209
      int32 3968606
      int64 392057876974
      double 2.55008e+20
      double 1.06095e+27
      double 3.68537e+32
      double 6.05475e-310
      int32 28533
      int32 28549
   }
error time=1635287818.294359 sender=org.freedesktop.DBus -> destination=:1.1722 error_name=org.freedesktop.DBus.Error.ServiceUnknown reply_serial=1820925
   string "The name nvidia.powerd.server was not provided by any .service files"

@MaxX2342’s script works as intended, and the error disappears.

There are about 27 method calls per second, which seems a bit excessive.

1 Like

The solution provided by @MaxX2342 works (thanks!) but you can also use the dbus-test-tool to black-hole all messages. This utility is provided directly by freedesktop. Here is my preferred workaround:

/etc/dbus-1/system.d/nvidia-fake-powerd.conf (renamed to be more expressive and configured to run as the dbus user)

<!DOCTYPE busconfig PUBLIC
 "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>
        <policy user="dbus">
                <allow own="nvidia.powerd.server"/>
        </policy>
        <policy context="default">
                <allow send_destination="nvidia.powerd.server"/>
                <allow receive_sender="nvidia.powerd.server"/>
        </policy>
</busconfig>

The simple systemd service /etc/systemd/system/nvidia-fake-powerd.service:

[Unit]
Description=NVIDIA fake powerd service

[Service]
Type=dbus
BusName=nvidia.powerd.server
ExecStart=/usr/bin/dbus-test-tool black-hole --system --name=nvidia.powerd.server
User=dbus
Group=dbus
LimitNPROC=2
ProtectHome=true
ProtectSystem=full

[Install]
WantedBy=default.target
Alias=dbus-nvidia.fake-powerd.service

Of course once these files are in place, systemd must be reloaded and the service enabled/started:

systemctl daemon-reload
systemctl enable nvidia-fake-powerd.service
systemctl start nvidia-fake-powerd.service
systemctl status nvidia-fake-powerd.service
7 Likes

@pobrn It’s about the same rate for me with an idle desktop. Watching YouTube with (ungoogled-)chromium pushes the rate up to around 85! Using mpv goes in a similar direction. Interestingly the main producer in the first case is gnome-shell, not chromium itself.

To find out what’s actually generating these method calls you can look up the value in sender=:1.53 with busctl list.

:1.53 1248 gnome-shell user :1.53 user@1000.service - -

@user16974 Very nice solution, I think I’ll switch to this. Up until now I didn’t know about dbus-test-tool - another tool to remember in the future :-D

1 Like

I can confirm that @user16974 's solution using dbus-test-tool does work for me as well. Yet, since my Linux distribution (PCLinuxOS) is (thankfully !) not using systemd, I had to adapt it.
Here is a SysV init “fake_nvidia_powerd” script for such non-systemd distributions:

#!/bin/bash
#
# fake_nvidia_powerd: Fake nvidia.powerd.server dbus server
#
# chkconfig: 23457 98 12
# description: This is a fake nvidia.powerd.server dbus server acting as a \
#              blackhole so to work around a NVIDIA driver bug.
# See:
# https://forums.developer.nvidia.com/t/bug-nvidia-v495-29-05-driver-spamming-dbus-enabled-applications-with-invalid-messages/192892
#
# processname: dbus-test-tool
# pidfile: /var/run/fake_nvidia_powerd.pid
#
### BEGIN INIT INFO
# Provides: fake_nvidia_powerd
# Should-Start: messagebus
# Default-Start: 2 3 4 5 7
# Default-Stop: 0 1 6
# Short-Description: Fake nvidia.powerd.server dbus server
# Description: This is a nvidia.powerd.server dbus server acting as a blackhole
#  so to work around a NVIDIA driver bug.
### END INIT INFO

# Sanity checks.
[ -x /usr/bin/dbus-daemon ] || exit 0

# Source function library.
. /etc/rc.d/init.d/functions

dbus_user=messagebus
dbus_conf_file="/etc/dbus-1/system.d/fake-nvidia-powerd.conf"
servicename=fake_nvidia_powerd

RETVAL=0

check_dbus_conf() {
	if [ -f "$dbus_conf_file" ] ; then
		return
	fi
   	gprintf "Adding dbus configuration file: "
	cat << EOFMARKER >"$dbus_conf_file"
<!DOCTYPE busconfig PUBLIC
 "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>
        <policy user="$dbus_user">
                <allow own="nvidia.powerd.server"/>
        </policy>
	<policy context="default">
		<allow send_destination="nvidia.powerd.server"/>
		<allow receive_sender="nvidia.powerd.server"/>
	</policy>
</busconfig>
EOFMARKER
	if [ -f "$dbus_conf_file" ] ; then
		echo_success
		# Let dbus reload its configuration files...
		sleep 2
	else
		echo_failure
	fi
	echo
}

start() {
	if ! [ -f /var/lock/subsys/$servicename ] ; then
		check_dbus_conf
    	gprintf "Starting a fake nvidia.powerd.server dbus server: "
		nohup runuser -u $dbus_user -- dbus-test-tool black-hole --system --name=nvidia.powerd.server &>/dev/null &
    	RETVAL=$?
		pid=$!
		if [ $RETVAL -eq 0 ] ; then
			echo $pid >/var/run/$servicename.pid
			touch /var/lock/subsys/$servicename
			echo_success
		else
			echo_failure
		fi
	else
    	gprintf "The fakenvidia.powerd.server dbus server is already running"
		echo_warning
	fi
	echo
}

stop() {
    gprintf "Stopping the fake nvidia.powerd.server dbus server: "
    ## We do not want to kill all the per-user dbus-test-tool, we want
    ## to use the pid file *only*; because we use the fake nonexistent 
    ## program name "$servicename" that should be safe-ish
    killproc $servicename -TERM
    RETVAL=$?
    echo
    if [ $RETVAL -eq 0 ]; then
        rm -f /var/lock/subsys/$servicename
        rm -f /var/run/$servicename.pid
    fi
}

# See how we were called.
case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    status)
        status $servicename
        RETVAL=$?
        ;;
    restart)
        stop
        start
        ;;
    *)
        gprintf "Usage: %s {start|stop|status|restart}\n" "$0"
        ;;
esac
exit $RETVAL

Place it where your distro got its init scripts (/etc/rc.d/init.d/ for PCLinuxOS), then enable it as root (e.g. with ‘chkconfig --add fake_nvidia_powerd;service fake_nvidia_powerd start’ for PCLinuxOS).

I am however impatient to see NVIDIA acknowledge this bug and take it into account for the next v495 driver release, because it is absolutely abnormal that a now “stable” release got such an enormous flaw !

It must also be noted that v470 does have references to nvidia.powerd.server in its shared libraries, but they never caused such a havoc as what happens with v495.

Plus, the simple fact that the NVIDIA v495 libraries communicate with dbus (apparently/roughly) at each OpenGL frame (if to judge by the rate of the log spam) is totally unreasonable and a total wastage of processing power (which might also explain why I see NVIDIA OpenGL performances degrading with each new release); please NVIDIA, stop the non-sense and stop bloating your drivers with useless code !

1 Like

This works for me. This was causing the process kwin_x11 to spam the system log about the dbus message queue being full.

I packaged this solution up on the AUR for Arch Linux users.

https://aur.archlinux.org/packages/nvidia-fake-powerd/

2 Likes

Also confirmed on driver version 495.44. I write a binary patch to bypass libnvidia-glcore.so from sending nvidia.powerd.server message to dbus. Not sure if there are any side effects. This patch is only for version 495.44.

perl -pwE 's/\x0f\x84\x5b\x01\x00\x00\x4c\x8d/\xe9\x5c\x01\x00\x00\x00\x4c\x8d/g' libnvidia-glcore.so.495.44 > libnvidia-glcore.so.495.44.new
4 Likes

Providing ebuild of the same package for Gentoo: https://github.com/Tatsh/tatsh-overlay/blob/master/sys-apps/nvidia-fake-powerd/nvidia-fake-powerd-0.1.0.ebuild

Binary patch works nicely, thanks for that.

Did the change for 32bit glcore as well, simple enough (believe it’s right, no more spam anyhow)

perl -pwE 's/\x0f\x84\x65\x01\x00\x00\x83\xec\x08\x89/\xe9\x66\x01\x00\x00\x00\x83\xec\x08\x89/' 32/libnvidia-glcore.so.495.44 > 32/libnvidia-glcore.so.495.44.new

Originally thought to stop it from loading libdbus entirely (it’s optional) but it gives connection error messages instead.

3 Likes

I have Archlinux, Linux leonidas 5.15.2-zen1-1-zen #1 ZEN SMP PREEMPT Fri, 12 Nov 2021 19:22:07 +0000 x86_64 GNU/Linux, 01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1), kwin 5.23.3-1, nvidia-dkms 495.44-5 .

I got these messages flooding my journal:

lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")
lis 21 12:19:41 leonidas dbus-daemon[475]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.29" (uid=1000 pid=647 comm="/usr/bin/kwin_x11 ")

I installed from AUR https://aur.archlinux.org/packages/nvidia-fake-powerd/ and started and enabled service - it got fixed. Thanks user16974 and dkub for the fix.

but for me, i get tons of

nov 22 02:31:58 sL1pKn07 dbus-daemon[1032]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.1523686" (uid=1000 pid=3298234 comm="Z:\home\wine-WoT\World_of_Tanks_EU\World_of_Tanks_")
nov 22 02:31:58 sL1pKn07 dbus-daemon[1032]: [system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.ServiceUnknown" requested_reply="1" destination=":1.1523686" (uid=1000 pid=3298234 comm="Z:\home\wine-WoT\World_of_Tanks_EU\World_of_Tanks_")

the nvidia-fake-powerd package not silence the spaming

@ uzytkownik510 are you sure is fixed for you this you posted messages?

greetings

Did you enable and start it? What does sudo systemctl status nvidia-fake-powerd say?

started and enabled correctly

after some reboots and some systemd log clean. the spam go away, so seems working as spected

sorry for the noise

greetings