Nvidia-smi "No device where found"

Ok, I understand, thx.
It is a timing problem?

The entry /dev/dri/card2 not exist.

modprobe-r nvidia says
modprobe: FATAL: Module nvidia is in use.

I think the same problem during my script is called.

First I remove it with
modprobe -r nvidia
sleep 1
remove pci and scan…

Please check if nvidia-persistenced is enabled with systemd and disable it.

Yes it is enabled
But disable or mask won’t work.

After boot it is loaded. What is this?

systemctl status nvidia-persistenced.service 
● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static)
     Active: active (running) since Sun 2021-12-26 22:25:22 CET; 30s ago
    Process: 935 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose (code=exited, status=0/SUCCESS)
   Main PID: 938 (nvidia-persiste)
      Tasks: 1 (limit: 38356)
     Memory: 724.0K
        CPU: 3ms
     CGroup: /system.slice/nvidia-persistenced.service
             └─938 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose

Dez 26 22:25:22 michael-MacPro systemd[1]: Starting NVIDIA Persistence Daemon...
Dez 26 22:25:22 michael-MacPro nvidia-persistenced[938]: Verbose syslog connection opened
Dez 26 22:25:22 michael-MacPro nvidia-persistenced[938]: Now running with user ID 124 and group ID 134
Dez 26 22:25:22 michael-MacPro nvidia-persistenced[938]: Started (938)
Dez 26 22:25:22 michael-MacPro nvidia-persistenced[938]: device 0000:19:00.0 - registered
Dez 26 22:25:22 michael-MacPro nvidia-persistenced[938]: Local RPC services initialized
Dez 26 22:25:22 michael-MacPro systemd[1]: Started NVIDIA Persistence Daemon.

What can I do with the command
$ nvidia-persistenced

It’s needed for headless, compute-only servers to keep the driver loaded and initialized.
Should be no problem to disable it, iirc ubuntu uses a udev rule in /lib/udev/rules.d to start it. Try removing that and run sudo update-initramfs -u to also remove it from initrd.

Or just add
systemctl stop nvidia-persistenced
at the start of your script.

Ok, the service is deactivated.

I removed it from rules.d.

systemctl status nvidia-persistenced.service                
○ nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static)
Active: inactive (dead)

How does this helps?

Like said, it keeps the driver loaded, thus locked. You should now be able to unload it.

No, the message is still the same.

sudo modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nvidia
modprobe: FATAL: Module nvidia is in use.

Since you disabled nvidia-persistenced and rebooted, maybe now your script is working and xorg is blocking module unloading which is fine?
pls check nvidia-smi for the xorg process.

No, I checked smi, but no process.

Now I switched back to manjaro linux.
I also need my script to remove and scan the pci bus. I don’t know why, the service won’t start at boot.
So I need to start it manually or with crontab.

But… I can use the gpu with prime-run. It is not as perfect, as I use the GPU directly, but it is better then nothing, and it’s much more faster than the amd gpu. For the moment I will use this.

Perhaps I got the service running at boot, and under manjaro the timing is better for reloading the nvidia driver?

In manjaro I have the third card.
Is it possible to switch manually to the nvidia video output?

   /  ls -la /dev/dri                                                                                                                                                              ✔ 
insgesamt 0
drwxr-xr-x   3 root root        180 27. Dez 11:22 .
drwxr-xr-x  22 root root       4280 27. Dez 12:16 ..
drwxr-xr-x   2 root root        160 27. Dez 11:22 by-path
crw-rw----+  1 root video  226,   0 27. Dez 11:22 card0
crw-rw----+  1 root video  226,   1 27. Dez 11:22 card1
crw-rw----+  1 root video  226,   2 27. Dez 12:59 card2
crw-rw-rw-   1 root render 226, 128 27. Dez 11:22 renderD128
crw-rw-rw-   1 root render 226, 129 27. Dez 11:22 renderD129
crw-rw-rw-   1 root render 226, 130 27. Dez 11:22 renderD130
    /  ls -la /dev/dri/by-path                                                                                                                                                      ✔ 
insgesamt 0
drwxr-xr-x 2 root root 160 27. Dez 11:22 .
drwxr-xr-x 3 root root 180 27. Dez 11:22 ..
lrwxrwxrwx 1 root root   8 27. Dez 11:22 pci-0000:02:00.0-card -> ../card1
lrwxrwxrwx 1 root root  13 27. Dez 11:22 pci-0000:02:00.0-render -> ../renderD129
lrwxrwxrwx 1 root root   8 27. Dez 12:59 pci-0000:06:00.0-card -> ../card2
lrwxrwxrwx 1 root root  13 27. Dez 11:22 pci-0000:06:00.0-render -> ../renderD130
lrwxrwxrwx 1 root root   8 27. Dez 11:22 pci-0000:19:00.0-card -> ../card0
lrwxrwxrwx 1 root root  13 27. Dez 11:22 pci-0000:19:00.0-render -> ../renderD128

How does this loooks?

The script works now at boot, and the /dev/dri/card0 - 2 are available.

journal.txt (129.1 KB)

nvidia-bug-report.log (1.0 MB)

I have a screen from the eGPU now.

Providers: number : 3
Provider 0: id: 0x1b7 cap: 0x1, Source Output crtcs: 4 outputs: 4 associated providers: 2 name:NVIDIA-0
Provider 1: id: 0x243 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 6 outputs: 6 associated providers: 1 name:modesetting
Provider 2: id: 0x208 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 6 outputs: 6 associated providers: 1 name:modesetting
    ~  nvidia-smi                                                                                                                                                        ✔ 
Mon Dec 27 18:46:30 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:19:00.0  On |                  N/A |
|  0%   55C    P0    N/A /  90W |    102MiB /  4040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1505      G   /usr/lib/Xorg                      99MiB |
|    0   N/A  N/A      2139      G   /usr/bin/nvidia-settings            0MiB |
+-----------------------------------------------------------------------------+

I used this xorg.conf

Section "Module"
    Load "modesetting"
EndSection

Section "Device"
    Identifier "Device0"
    Driver     "nvidia"
    BusID      "PCI:25:0:0"
    Option     "AllowEmptyInitialConfiguration"
    Option     "AllowExternalGpus" "True"
EndSection

nvidia-settings also shows me the card infos and Displays.
But it is very slow. I think it doesn’t use the GPU for 3D, only for display. What can I do?
nvidia-bug-report.log (1.0 MB)

Your suspicion is correct, the glx driver is not found:

Failed to load module "glxserver_nvidia"

You need to set the path to it, e.g. in a “Section Files” with
ModulePath “/usr/lib/nvidia/xorg”
ModulePath “/usr/lib/xorg/modules”

1 Like

Yes that works. Great.

One small problem left.

OpenGL now use the nvidia, but vulkan use it only with the parameter “__NV_PRIME_RENDER_OFFLOAD=1 ”

glxinfo | grep vendor                                                                                                                                  
server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation
vkcube                                                                                                                                                     
WARNING: radv is not a conformant Vulkan implementation, testing use only.
WARNING: radv is not a conformant Vulkan implementation, testing use only.
Selected GPU 0: AMD RADV TAHITI, type: 2
__NV_PRIME_RENDER_OFFLOAD=1 vkcube                                                                                                                  
WARNING: radv is not a conformant Vulkan implementation, testing use only.
WARNING: radv is not a conformant Vulkan implementation, testing use only.
Selected GPU 0: NVIDIA GeForce GTX 1050 Ti, type: 2

How I can set this to automatcially use vulkan by the nvidia card?

And is it possible to optimize something? Some tipps?

Seems this is some shortcoming of Vulkan, from the render offload page:

Vulkan applications use the Vulkan API to enumerate the GPUs in the system and select which GPU to use; most Vulkan applications will use the first GPU reported by Vulkan.

You could use
export __NV_PRIME_RENDER_OFFLOAD=1
in your system/user profile.

Yes I will try that. But otherwise, it works now as it should.
Many thanks.