Linux nvidia "GPU screens are not yet supported "

Hello,

My computer (Asus Zenbook UX433F) embeds this model of Nvidia card and I’m trying to install the nvidia driver:

02:00.0 3D controller: NVIDIA Corporation GP108M [GeForce MX150] (rev a1)

I use an up-to-date Debian Buster amd 64. To install the nvidia proprietary module, I tried two methods:

  1. via the debian package nvidia-kernel-dkms/testing 440.82-1 amd64
  2. via the NVIDIA installation script NVIDIA-Linux-x86_64-440.82.run

Unfortunately, the outcome is the same. When I restart my computer, the proprietary driver (nvidia) is correctly loaded:

(root@aldur) (~) # lsmod | grep nvidia
nvidia_drm             53248  0
nvidia_modeset       1118208  1 nvidia_drm
nvidia              20508672  4 nvidia_modeset
ipmi_msghandler        73728  2 ipmi_devintf,nvidia
drm_kms_helper        233472  2 nvidia_drm,i915
drm                   585728  15 drm_kms_helper,nvidia_drm,i915

The X server runs fine, but does not use my nvidia card (it uses the intel one).

(mando@aldur) (~) $ nvidia-smi 
Thu Apr 23 20:17:49 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce MX150       Off  | 00000000:02:00.0 Off |                  N/A |
| N/A   48C    P0    N/A /  N/A |      0MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(mando@aldur) (~) $ nvidia-settings 

ERROR: Unable to load info from any available system

If I dig in /var/log/Xorg.0.log I get this error message:

(root@aldur) (~) # grep EE /var/log/Xorg.0.log
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[   324.584] (EE) Failed to load module "nv" (module does not exist, 0)
[   324.670] (EE) NVIDIA(G0): GPU screens are not yet supported by the NVIDIA driver
[   324.670] (EE) NVIDIA(G0): Failing initialization of X screen
[   324.747] (II) Initializing extension MIT-SCREEN-SAVER
[   325.328] (EE) Failed to open authorization file "/var/run/sddm/{04dfb126-6d7c-4647-9d28-a20ab276f135}": No such file or directory

I searched this error message and found a workaround evoked here. But it does not work; if I do so and restart X, I just get a black screen, /var/log/messages says that Option "PrimaryGPU" "yes" is ignored. As a sequel, I have to revert /etc/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf.

For your information,

  • the open source driver (nouveau) does not work. The X server takes ~30s to start, sometimes hangs, and in the terminal I observe message like irm stalled, [TTM] buffer eviction failed. DRM failed to idle channel 1 [DRM]
  • when the secure boot is enabled I had to sign the module according to this wiki otherwise I cannot load the module. I also tried to disable secure boot to see if it changes something. But no, the problem persists.

So my question is simple. What should I do to make work the nvidia module ?

Thanks for your help.

Did you read this?

http://download.nvidia.com/XFree86/Linux-x86_64/440.82/README/primerenderoffload.html

This might be of use

https://rpmfusion.org/Howto/Optimus

Does buster mesa have libglvnd support?

1 Like

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).
Which DE are you running?
For a standard PRIME config, please see this:
https://devtalk.nvidia.com/default/topic/1022670/linux/official-driver-384-59-with-geforce-1050m-doesn-t-work-on-opensuse-tumbleweed-kde/post/5203910/#5203910

1 Like

Hello Leigh123linux and generix,

@leigh123linux
As suggested in your link, I created /etc/X11/xorg.conf.d/nvidia.conf and put inside:

Section "ServerLayout"
        Identifier "layout"
        Option "AllowNVIDIAGPUScreens"
EndSection

When I restart my X server, the GPU screen error indeed disappears :-) nvidia-settings now runs fine, nvidia-smi is happy as well as torch:

(mando@aldur) (~) $ nvidia-smi 
Mon Apr 27 11:11:25 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce MX150       Off  | 00000000:02:00.0 Off |                  N/A |
| N/A   42C    P8    N/A /  N/A |     14MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2053      G   /usr/lib/xorg/Xorg                            14MiB |
+-----------------------------------------------------------------------------+

(mando@aldur) (~) $ ipython3  
Python 3.8.2 (default, Apr  1 2020, 15:52:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch  
  ...: torch.cuda.is_available()                                                                        
Out[1]: True

So I believe you solved my problem, thanks a lot!

@Generix, just in case, please find enclosed the nvidia report nvidia-bug-report.log.gz.log (1.1 MB). I suppose that PRIME is what debian calls primus. For the record, I installed primus and bumblebee in the past, but I removed it because torch / nvidia-smi / nvidia-settings did not run fine through optirun. My conclusion at this moment was that my problem was related to the underlying nvidia module rather than bumblebee.

My next steps

On my side, I’ll try to compare performance of CPU against GPU. Unfortunately, glxgears is not enough to evaluate the boost of performance provided by nvidia and glmark2 is not available under debian.

So I’ll do this comparison directly with torch but I’ll need some time because I’m not yet very familiar with this framework.

If I observe a significant gap between CPU and GPU computations, I’ll assume that everything is fine. And then I’ll finalize the installation to get something more clean (1. using the debian package 2. re-enabling secure boot 3. re-installing primus).

Thanks a lot for your precious help!

Fine it works, the logs also are telling so. Just for completeness, PRIME and primus are different things. PRIME is a method/infrastructure within kernel/xorg. You’re now using PRIME offload (on demand), it has a second mode, PRIME output (which the link I posted is for) which makes everything run on the nvidia gpu. In offload mode, no external monitors connected to the nvidia gpu will work. Use whichever mode fits your needs.

1 Like

@generix
Thank you for these precisions and for your help :-)

Hello,

Finally, I was able to install nvidia on my debian bullseye amd 64, through the nvidia-driver debian package (it also worked with the Nvidia installation script) and with and without secure boot.

1. Installing nvidia driver

With recent kernels, Secure boot impose to create / enroll a MOK key + sign the nvidia driver. This is explained in the Wiki Debian / Secure boot. Below, I provide two script I adapted / wrote to achieve the corresponding steps.

  1. (secure boot only) Create the following files:

/root/secure_boot/enroll.sh

#!/bin/sh
#https://wiki.debian.org/SecureBoot

echo "Creating MOK.priv and MOK.der"
openssl req -new -x509 -newkey rsa:2048 -keyout MOK.priv -outform DER -out MOK.der -days 36500 -subj "/CN=My Name/" -nodes

echo "Importing MOK.der, please enter your one-time password"
mokutil --import MOK.der 
echo "Check that your key is listed below"
mokutil --list-new 

echo "Now, reboot the machine. Then, enters MOK manager EFI utility"
echo "enroll MOK, continue, confirm, enter password, reboot>"
echo "Finally, then check it is loaded with dmesg | grep cert"
exit 0

/root/secure_boot/sign.sh

#!/bin/sh
PRIV=/root/secure_boot/MOK.priv
DER=/root/secure_boot/MOK.der

for filename in $PRIV $DER
do
  (test -f $filename && echo "$filename found :-)") || (echo "$filename not found" && exit 1)
done

KBUILD_VER=$(uname -r | cut -d"." -f1,2)
echo "Kbuild version $KBUILD_VER"
cd /lib/modules/$(uname -r)/updates/dkms
for ko in $(ls -1 *.ko)
do
  echo "Signing $ko"
  /usr/lib/linux-kbuild-$KBUILD_VER/scripts/sign-file sha256 $PRIV $DER $ko
done
exit 0
  1. (secure boot only) Create MOK keys (/root/secure_boot/MOK.der and /root/secure_boot/MOK.der) by running /root/secure_boot/enroll.sh then enroll they key. This is required only once.
  2. Install nvidia-driver:
apt update
apt install nvidia-driver 
  1. (secure boot only) Sign the module by running /root/secure_boot/sign.sh. This is required whenever the kernel or the nvidia driver is updated.
  2. In my case, I also have to create /etc/X11/xorg.conf.d/nvidia.conf(see previous messages):
Section "ServerLayout"
  Identifier "layout"
  Option "AllowNVIDIAGPUScreens"
EndSection
  1. Reboot. Then, ensure that nvidia driver is loaded.
lsmod | grep nvidia
nvidia-smi

2. Benchmark CPU vs GPU

This section presents steps to run a python3 script demonstrating the gap between CPU and GPU.

  1. Install Nvidia driver (see previous section).
  2. Install torch & CUDA requirements:
apt update
apt install nvidia-cuda-dev nvidia-cuda-toolkit python3 python3-pip
pip3 install torch
  1. Create and run the following python script:
#!/usr/bin/env python3

import functools, time
import torch

def timer(f):
    """Print the runtime of the decorated function"""
    @functools.wraps(f)
    def wrapper_timer(*args, **kwargs):
        start_time = time.perf_counter()
        value = f(*args, **kwargs)
        end_time = time.perf_counter()
        run_time = end_time - start_time
        print(f"Finished {f.__name__!r} in {run_time:.4f} secs")
        return value
    return wrapper_timer

@timer
def cpu(x, y, num_times):
    for _ in range(num_times):
        z = torch.matmul(x, y)

@timer
def gpu(x, y, num_times):
    for _ in range(num_times):
        z = torch.matmul(x, y)

torch.cuda.init()
assert torch.cuda.is_available() # Fail if nvidia and CUDA dependencies are not installed 

X = torch.randn(10, 300, 400)
Y = torch.randn(10, 400, 500)
X_cuda = X.cuda()
Y_cuda = Y.cuda()
N = 1000
cpu(X, Y, N)
gpu(X_cuda, Y_cuda, N)

Results: (note that matrices have to be large enough to make GPU usage worth)

Finished 'cpu' in 5.3031 secs
Finished 'gpu' in 0.0213 secs

Once again, I’d like to thank to @generix and @leigh123linux for their help.

Best regards,
mando