No GUI - Ubuntu 16.04 (4.4.0-119-generic) , Quadro K5200, Tesla K40c, Driver Version: 387.26

Hi
I have a Ubuntu 16.04 machine with 2 Nvidia cards a Quadro K5200 and a Tesla K40c . I have setup CUDA 9.1 and it installed from ( http://developer.download.nvidia.com/compute/cuda/9.1/secure/Prod/local_installers/cuda_9.1.85_387.26_linux.run )
The CUDA programs are working ( deviceQuery, nvidia-smi ). But the GUI display is not coming up . I switched from lightdm to gdm3 and this is the error I see ( /var/lib/gdm3/.local/share/xorg/Xorg.0.log )

Screen(s) found, but none have a usable configuration

This is the various outputs of commands
lspci |grep -i vga
04:00.0 VGA compatible controller: NVIDIA Corporation GK110GL [Quadro K5200] (rev a1)

lsmod |grep -i nv
nvidia_drm 49152 0
nvidia_modeset 901120 1 nvidia_drm
nvidia 13918208 1 nvidia_modeset
drm_kms_helper 155648 1 nvidia_drm
drm 364544 3 drm_kms_helper,nvidia_drm

The nouveau drivers have been disabled
cat /etc/default/grub |grep -i nou
lsmod |grep -i nou

Running nvidia-xconfig --preserve-busid --enable-all-gpus --allow-empty-initial-configuration

[b]Using X configuration file: “/etc/X11/xorg.conf”.

WARNING: Unable to use the nvidia-cfg library to query NVIDIA hardware.

ERROR: Unable to determine number of GPUs in system; cannot honor ‘–enable-all-gpus’ option[/b].

any help appreciated in getting the GUI up and running

Probably xorg.conf congfiguration is not matching the fact that you want your GUI to be running on the K5200, and the --enable-all-gpus switch is not helping (why don’t you try that command without that switch? That is what the error message is indicating)

This may help you get things sorted:

[url]USING CUDA AND X | NVIDIA

Thanks @txbob . But that did not help . I added a K5200 as follows
Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BoardName “Quadro K5200”
EndSection

but now I get an error
[ 907.819] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:4:0:0. Please
[ 907.819] (EE) NVIDIA(GPU-0): check your system’s kernel log for additional error
[ 907.819] (EE) NVIDIA(GPU-0): messages and refer to Chapter 8: Common Problems in the
[ 907.819] (EE) NVIDIA(GPU-0): README for additional information.
[ 907.819] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[ 907.819] (EE) NVIDIA(0): Failing initialization of X screen 0

Doing lspci | grep nvidia seems to indicate that the device is correct

03:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40c] (rev a1)
04:00.0 VGA compatible controller: NVIDIA Corporation GK110GL [Quadro K5200] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GK110 HDMI Audio (rev a1)

what is the result of running

sudo dmesg |grep NVRM

?

[ 191.640837] NVRM: RmInitAdapter failed! (0x30:0xffff:662)
[ 191.640870] NVRM: rm_init_adapter failed for device bearing minor number 1
[ 192.070408] NVRM: failed to copy vbios to system memory.

Repeated infinitely

I find it hard to believe that deviceQuery is working correctly with those messages in your system message log.

In this state, deviceQuery properly reports both GPUs ?

what kernel version?

uname -a

Sorry I should have caught this before . deviceQuery reports only 1 GPU . The lspci reported the Quadro K5200 . This is the output of the uname -a

4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

The strange thing is that I remove the nvidia drivers and the monitor and GUI works . (with lightdm ). so I never suspected a problem in the hardware

which GPU is reported? The K40c or the K5200?

sounds like you haven’t properly removed the nouveau driver

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-nouveau

switch to runlevel 3 (or stop the gdm/GUI) before following those steps
then reboot

you can verify that nouveau is properly removed with a command like this:

lsmod |grep -i nouv

which should return empty

The K40c is what is reported by the deviceQuery .

I have one observation ( not sure if that is the cause of the problem )

The K5200 has 2 display output and I had connected the monitor to one of them . I decided to switch the monitor ports and suddenly I am not able to do Ctrl+Alt+F1 etc. and the font sizes even on the console mode have gone to 640x480

I am not sure if that has got any bearing on this at all ?

This is all sounding like a problem with nouveau, i.e. you haven’t properly removed the nouveau driver.

Thanks for the help so far . But I think I have followed everything and the nouveau drivers are not present . and I have no idea .

I will may be leave at this point and contine to use the CUDA stuff ( since that was the main option ) .

Will debug this later .

But thanks again for the help . Much appreciated

I have the same issue.

$ sudo nvidia-xconfig --enable-all-gpus
[sudo] password for alex: 

Using X configuration file: "/etc/X11/xorg.conf".

ERROR: Unable to determine number of GPUs in system; cannot honor '--enable-all-gpus' option.

Backed up file '/etc/X11/xorg.conf' as '/etc/X11/xorg.conf.backup'
New X configuration file written to '/etc/X11/xorg.conf'
$ lsmod |grep -i nouv
$ uname -a
Linux alex-XPS-15-9570 5.0.0-36-generic #39~18.04.1-Ubuntu SMP Tue Nov 12 11:09:50 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ kf5-config --version                                                                                                                                                                  
Qt: 5.9.5                                                                                                                                                                                                                             
KDE Frameworks: 5.44.0                                                                                                                                                                                                                
kf5-config: 1.0