Nvidia-settings returns 'Connection refused. ERROR: The control display is undefined'

I have been unable to use nvidia-settings assign settings to my GPUs, so I must be missing something fundamental that I think is related to xorg.conf settings. I need some pointers:

  1. nvidia-settings returning ‘control display is undefined’
  2. Headless operation?
  3. Starting a dummy xserver for an ssh session.

Operating system: Ubuntu Server 20.04 LTS
GPUs (2): Nvidia GTX1080 and Nvidia GTX1080ti
Nvidia Driver Version: 460.39 ← not the latest
Nvidia bug report gz - attached

I get the impression from reading the forums that ‘headless operation’ is something I should go for to run CUDA compute workloads and do my admin remotely.

I have not been able to get mobo on-board graphics to work - the mobo will not POST. I know this should be simple - the ASUS desktop mobo BIOS I am stuck with is stupidly complex (gamer heaven for tweaking perhaps, but useless for an enterprise). I will experiment further, although I have to be there in person to recover from a non-post state. I have instead connected an HDMI display to GPU:1

  • I have started driver persistence successfully from SSH command line with: sudo nvidia-smi -pm 1
  • Gnome desktop installed successfully, local logins work
  • lspci returns both Nvidia GPU ‘VGA’ devices ok

I have run: sudo nvidia-xconfig -a --allow-empty-initial-configuration --cool-bits=28 --enable-all-gpus

The xorg.config files looks plausible. nvidia-settings is not happy however, when I run and of the following:

sudo nvidia-settings -q all
sudo nvidia-settings -q gpus
sudo nvidia-settings -q screens
sudo nvidia-settings -q framlocks
sudo nvidia-settings -q fans
sudo nvidia-settings -q thermalsesors
sudo nvidia-settings -q svps
sudo nvidia-settings -q dpys
sudo nvidia-settings -q anyting-you-like

I get the same result (so there is a fundamental common failure):
Unable to init server: Could not connect: Connection refused

ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.

I have studied the --help output, but it does not give me enough hints to understand why I get ‘connection refused’ or ‘control display is undefined’.

Output from ‘lspci -nnk’, shows the nvidia driver is used; I note that nouveau is referenced too however (I read that nouvaeu is not desirable, but unclear why. I have amended my grub options to disable nouveau, perhaps not correctly?).

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1)
Subsystem: Gigabyte Technology Co., Ltd GP102 [GeForce GTX 1080 Ti] [1458:374c]
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
Subsystem: ASUSTeK Computer Inc. GP104 [GeForce GTX 1080] [1043:85aa]
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

I have read this post: nvidia-settings: unable to init server
This described most of what I need to do, noting that SSH terminals have the issue of no DISPLAY, and the response from the ‘top contributor’ was “not possilble, start a dummy xserver”. So do I need to figure out how to start a dummy xserver?

Any pointers you can offer are very welcomenvidia-bug-report.log.gz|attachment (665.4 KB)

You will have to run an xserver on the gpus and properly set the DISPLAY variable.

Hi genrix, yes I see. Can you direct me to any sources so I can learn how to run an xserver on the gpu?