GPU driver not working properly

After installing nvidia driver 418 on ubuntu 16.04, the login GUI keeps crashes after login and here is the error log file, .xsession-errors

X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 154 (GLX)
Minor opcode of failed request: 3 (X_GLXCreateContext)
Value in failed request: 0x0
Serial number of failed request: 27
Current serial number in output stream: 28
openConnection: connect: No such file or directory
cannot connect to brltty at :0
upstart: gnome-session (Unity) main process (2638) terminated with status 1
upstart: logrotate main process (2478) killed by TERM signal
upstart: update-notifier-release main process (2546) killed by TERM signal
upstart: hud main process (2599) killed by TERM signal
upstart: indicator-bluetooth main process (2661) killed by TERM signal
upstart: indicator-power main process (2662) killed by TERM signal
upstart: indicator-datetime main process (2666) killed by TERM signal
upstart: indicator-printers main process (2672) killed by TERM signal
upstart: indicator-session main process (2673) killed by TERM signal
upstart: indicator-application main process (2732) killed by TERM signal
upstart: Disconnected from notified D-Bus bus
upstart: bamfdaemon main process (2585) killed by TERM signal
upstart: unity-panel-service main process (2642) killed by TERM signal
upstart: indicator-sound main process (2671) killed by TERM signal

$ nvidia-smi
Thu Jan 28 15:41:08 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN V Off | 00000000:04:00.0 Off | N/A |
| 24% 34C P8 N/A / N/A | 0MiB / 12036MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 TITAN V Off | 00000000:05:00.0 Off | N/A |
| 25% 37C P8 N/A / N/A | 0MiB / 12036MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 TITAN V Off | 00000000:08:00.0 Off | N/A |
| 26% 38C P8 N/A / N/A | 0MiB / 12036MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 TITAN V Off | 00000000:09:00.0 Off | N/A |
| 25% 38C P8 N/A / N/A | 0MiB / 12036MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 4 TITAN V Off | 00000000:85:00.0 Off | N/A |
| 25% 39C P8 N/A / N/A | 0MiB / 12036MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 5 TITAN V Off | 00000000:86:00.0 Off | N/A |
| 23% 36C P8 N/A / N/A | 0MiB / 12036MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 6 TITAN V Off | 00000000:89:00.0 Off | N/A |
| 21% 34C P8 N/A / N/A | 0MiB / 12036MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 7 TITAN V Off | 00000000:8A:00.0 Off | N/A |
| 23% 37C P8 N/A / N/A | 0MiB / 12036MiB | 4% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

driver is nvidia

$ prime-select query
nvidia

nvidia-settings does not work
$ sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings

ERROR: Unable to load info from any available system

After checking /etc/X11, xorg.conf is not there
I ran a bug report as attached.

After this, I ran nvidia-xconfig to generate xorg.conf file, and it only generates 1 GPU config file, but I have 8 GPUs.

$ cat /etc/X11/xorg.conf

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 418.56

Section “ServerLayout”
Identifier “Layout0”
Screen 0 “Screen0”
InputDevice “Keyboard0” “CoreKeyboard”
InputDevice “Mouse0” “CorePointer”
EndSection

Section “Files”
EndSection

Section “InputDevice”
# generated from default
Identifier “Mouse0”
Driver “mouse”
Option “Protocol” “auto”
Option “Device” “/dev/psaux”
Option “Emulate3Buttons” “no”
Option “ZAxisMapping” “4 5”
EndSection

Section “InputDevice”
# generated from default
Identifier “Keyboard0”
Driver “kbd”
EndSection

Section “Monitor”
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Unknown”
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option “DPMS”
EndSection

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
EndSection

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
SubSection “Display”
Depth 24
EndSubSection
EndSection

I rebooted and ran lightdm and it pops,

** (process:2249): WARNING **: Failed to open CK session: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.ConsoleKit was not provided by any .service files
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
update-alternatives: error: no alternatives for x86_64-linux-gnu_gfxcore_conf

And nvidia-settings still has same error as preivously shown.
$ sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings

ERROR: Unable to load info from any available system

And I ran a 2nd bug report as attached.
I have looked up the solutions online, and reinsall driver does not fix this.

nvidia-bug-report.log1.gz (3.4 MB)
nvidia-bug-report.log2.gz (4.5 MB)

Your desktop is running on an AST server graphics.

Sorry, I don’t get this. How can I fix it?

What are you trying to achieve? Where did you connect the monitor to?

The server is connected to a monitorwith remote access functions.
I want to login with GUI and be able use nvidia-settings to config my GPUs.
Currently, both are not working.
Thanks!

Then you’ll have to connect the monitor to a nvidia gpu and possibly disable the ast graphics in bios.

Thanks for the suggestion. The system worked with current hardware, and bios configuration. Any idea why they do not work after I reinstalled the system? The privious system was installed by someone else, and I kept the hard drive untouched. Is there anything I can do with the old system to figure out the root cause?

I disabled AST graphics in bios and still didn’t work. :(

Please create /etc/X11/xorg.conf.d/10-nvidia.conf

Section "OutputClass"
	Identifier "Nvidia"
	MatchDriver "nvidia-drm"
	Driver "nvidia"
    Option "AllowEmptyInitialConfiguration" "true"
EndSection

I guess the previous OS install used a headless nvidia driver install, meaning the desktop was running on AST while the nvidia gpus were only used for compute.
nvidia-settings can’t be used in this kind of setup which doesn’t matter since it’s only for graphics settings.
If you want to use nvidia-settings (for whatever reasons), the Xserver has to run on the nvidia gpus. Since 16.04 is an outdated non-glvnd system, it’s also not possible to have the AST and Nvidia GLX loaded at the same time.