[560.35.03] XNVCtrl fan control not working, coolbits ineffective

I’ve recently noticed that my custom application for GPU fan control has stopped working. It seems that the XNVCtrl interface is not functioning correctly - the following error is being reported when ran as root:

X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  157 (NV-CONTROL)
  Minor opcode of failed request:  3 ()
  Value in failed request:  0x17
  Serial number of failed request:  13
  Current serial number in output stream:  14

This can be reproduced with the following minimal example:

// Compile with: gcc -o nvfanbug nvfanbug.c -lX11 -lXNVCtrl
#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <X11/Xos.h>
#include <NVCtrl/NVCtrlLib.h>

int main(void)
{
	int gpu_id = 0;
	Display *display = XOpenDisplay(NULL);
	int screen = DefaultScreen(display);

	// This fails silently
	XNVCTRLSetTargetAttribute(display, NV_CTRL_TARGET_TYPE_GPU, gpu_id, 0, NV_CTRL_GPU_COOLER_MANUAL_CONTROL, 1);

	// This fails with BadValue error
	XNVCTRLSetTargetAttribute(display, NV_CTRL_TARGET_TYPE_COOLER, gpu_id, 0, NV_CTRL_THERMAL_COOLER_LEVEL, 50);

	XFlush(display);
	XCloseDisplay(display);
	return 0;
}

Apparently nvfancontrol is suffering from a similar issue:

$ sudo target/debug/nvfancontrol
WARN - No config file found; using default curve
INFO - NVIDIA driver version: 560.35.03
INFO - NVIDIA graphics adapter #0: NVIDIA GeForce RTX 2070 SUPER
INFO -   GPU #0 coolers: COOLER-0
ERROR - Could not update fan speed: XNVCtrl SetAttr(THERMAL_COOLER_LEVEL) failed; error 0
ERROR - Could not update fan speed: XNVCtrl SetAttr(THERMAL_COOLER_LEVEL) failed; error 0

I’ve also lost ability to control the fan through nvidia-settings even though I have coolbits set to 4 in my Xorg configuration. I can only change the fan speed when nvidia-settings is ran as root. Otherwise it reports the following error:

$ nvidia-settings -a "[gpu:0]/GPUFanControlState=0"


ERROR: The current user does not have permission for operation

  Attribute 'GPUFanControlState' (raven:0[gpu:0]) assigned value 0.

System info: nvidia-bug-report.log.gz (1.2 MB)

Has there been some breaking change in XNVCtrl?

Hope there’s a way to resolve this. Thank you.

When you say ran as root, do you mean only your application or is Xorg running as root as well?

Asking just in case given since a security fix like 2-3 years ago I forget, XNVCtrl started to require that Xorg runs as root to function for some reason (even if have access to /dev/nvidia* devices, not aware of any more specific way to grant access).

May not be what you’re running into, but It’s possible something changed on your distro/setup that made Xorg start to run as a user, and newly broke this for you.

Fan control through XNVCtrl still works fine for me with 560.35.03 as long as Xorg runs as root anyhow (I can run the application as non-root as long as Xorg itself is root).

That aside, if wrote that application yourself, may want to explore using nvml instead going forward, for an example:

Thanks, that might be it, because Xorg is running as my user. I will try running it as root and see what happens.

As per NVML, that’s definitely a nicer approach. It works for me without Xorg running as root. Unfortunately, I don’t think this API provides fan RPM data (at least not for my GPU - only for S-class devices, whatever that means). This makes it impossible to detect fan RPM instability, which is the reason I wrote my own fan controller program in the beginning.

I might end up having to use XNVCtrl for reading RPM and NVML for controlling the fan. Not a huge fan of that approach, but well…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.