Nvidia-smi "No device where found"

user91867 · December 12, 2021, 8:25am

Hi I have an exotic configuration.
A Mac pro 2013 with two intern AMD GPUs.
Because of the Thunderbolt 2 Interfaces and an available GeForce GTX 1050 Ti I thought I expand the Mac with an eGPU Card.

The eGPU seems to work, it is authorized. The GPU was found by Linux. But nvidia-smi only says “No device where found”.

I tried something in the meantime.
Tried different boot parameters, different xorg configs and different nvidia-driver Versions (470 and 495), Ubuntu and Manjaro

With 495 I got " NVRM: BAR1 is 0M @ 0x0 (PCI:0000:19:00.0)"
Actual I use the driver V470.86

My cmdline:
BOOT_IMAGE=/boot/vmlinuz-5.15-x86_64 root=UUID=9cdd965a-1aae-4df6-9478-eac5e837fda0 rw quiet apparmor=1 security=apparmor udev.log_priority=3 radeon.si_support=0 amdgpu.si_support=1 pcie_ports=native pci=realloc iommu=on

My actual distribution is manjaro.

– lspci -k says me the “nvidia” driver is in use

19:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 3351
Kernel driver in use: nvidia

– dmesg shows this error:
[ 1777.840544] NVRM: GPU 0000:19:00.0: RmInitAdapter failed! (0x22:0xffff:667)
[ 1777.840614] NVRM: GPU 0000:19:00.0: rm_init_adapter failed, device minor number 0

At the moment I have no ideas anymore and need help.

nvidia-bug-report.log (716.9 KB)

generix · December 13, 2021, 1:40pm

Please see this thread:
https://forums.developer.nvidia.com/t/driver-for-rtx3070-not-working-under-elementary-os-on-macbook-pro-with-egpu/164829?u=generix
With Macs, sometime pci=realloc is enough, sometime you’ll have to do full monty.

user91867 · December 14, 2021, 3:36pm

Hi,

thanks, I will try it, but one question.
In the link the last post to Number 6

get a root console, remove and add back the pci bridge
sudo -s
echo 1 > /sys/bus/pci/devices/0000:00:01.1/remove

My Mac does not have the pci device 00:01.1 to remove.

What device is this? Is this one of the PCI bridges?
00:01.0 - 00.03.0, log file at row 2293?

generix · December 14, 2021, 3:48pm

It’s the pci bridge the nvidia gpu is connected to according to lspci -t
Should be 0000:00:01.0 in your case.

user91867 · December 14, 2021, 8:05pm

Hi thanks, this seems to be working. Great!! :)

My steps:

Blacklist nvidia
update initrd
disable display-manager
4 reboot.
The steps above were the same as in the other posting.
At 5 I am not sure, if the blacklist worked.
First I needed an ssh access, because with
“echo 1 > sys/bus/pci/devices/000:00:01.0/remove” my keyboard and mouse was gone.
per ssh, I have to enter the command twice. After second time, the rescan command worked.
Was not neccesary. nvidia-smi found now a device
created new nvidia-bug report
nvidia-bug-report.log (1.6 MB)
start display:
$ systemctl start sddm.service

Ok, it seems to be working now, but what are the next steps? How I get this without step 6?
The nvidia x server settings shows me the GPU on Demand. What’s exactly the difference between “on Demand” and “Performance Mode”? How does prime-select and/or prime-query exactly works?
Any suggestions?

generix · December 15, 2021, 10:36am

Step 6 was just for debugging so i could see errors in case of failure.
Next step would be creating a systemd unit and a script to have this run automatically on system boot. e.g.

[Unit]
Description=Nvidia GPU initialization
Before=gpu-manager.service

[Service]
Type=oneshot
ExecStart=/usr/bin/egpu.sh
ExecStartPre=

[Install]
WantedBy=display-manager.target

Put the necessary commands into /usr/bin/egpu.sh, check if it works after boot, then re-enable sddm and reboot.
To enable the egpu for the Xserver, see:
https://forums.developer.nvidia.com/t/internal-display-freezing-on-startup-with-egpu/170468/4?u=generix

prime-select
“nvidia” aka “performance mode” means the nvidia gpu will always render everything.
“on-demand” means the nvidia gpu needs to be explicitly invoked to render an application, see:
https://download.nvidia.com/XFree86/Linux-x86_64/495.44/README/primerenderoffload.html

user91867 · December 17, 2021, 12:11pm

I created a service and a egpu script.
Both works, but not during boot.

Is it possible, that “Wanted-By=display-manager.service” instead .target?

Ok, for start during boot I have to play. Meanwhile I started the service manually and then I start the sddm.service.

The /etc/X11/xorg.conf.d/10-nvidia-egpu.conf also is created and the /etc/X11/xorg.conf was deleted. But my monitor keeps black.

nvidia-smi always write off, like this in the other post.

|   0  GeForce RTX 3070    Off  |

How can I enable the GPU manually? prim-switch is set to “nvidia” aka “performance mode”.

I would mainly use the nvidia card instead the AMD devices.

generix · December 17, 2021, 12:16pm

“Off” means persistence mode is Off, not the gpu. This is fine.
You might also want to try

Before=display-manager.service

user91867 · December 24, 2021, 10:28pm

Hi,

after many tries with “display-manager.service”, “graphical-target” Before, WantedBy, etc… I created a cronjob which start the script at boot. That seems to work. I don’t know why, but the systemd service file didn’t worked.

Now my last Problem. Monitor is connected to egpu, but no display.
/etc/X11/xorg.conf.d/10-nvidia-egpu.conf is created, but that has no effect.

How I get the Screen over the 1050? How I can switch to the 1050?

generix · December 25, 2021, 2:26pm

Please create a new nvidia-bug-report.log

user91867 · December 25, 2021, 10:00pm

nvidia-bug-report.log (2.0 MB)
My actual bug report

generix · December 26, 2021, 2:00pm

This looks like the driver is already loaded when the script removes/readds the bus, so it gets removed. Furthermore, it’s doing it too late, the Xserver has already started when the nvidia gpu comes alive.
X start after 10.5s
nvidia gpu ready after 13.2s

user91867 · December 26, 2021, 6:11pm

My latest bug report. The .service file now works, and I removed it from crontab at boot
nvidia-bug-report.log (2.0 MB)

Where you see the time it was loaded? The .service file should load it before display-manager.service.

But the Monitor still is black.

I also tried the command
$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep vendor
This worked.

server glx vendor string: SGI
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation

But with more compley graphic demos it fails.

$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ./valley

Loading "/home/michael/.Valley/valley_1.0.cfg"...
Loading "libGPUMonitor_x64.so"...
Loading "libGL.so.1"...
Loading "libopenal.so.1"...
Set 2560x1440 fullscreen video mode
X Error of failed request:  BadAlloc (insufficient resources for operation)
  Major opcode of failed request:  152 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  0
  Current serial number in output stream:  59
AL lib: (EE) alc_cleanup: 1 device not closed

It nearly works, a little is missing.

generix · December 26, 2021, 6:29pm

Just take a look at the timestamps in dmesg and xorg logs, then you see when things happen.
The timing seems to be correct now but the driver isn’t loaded after re-adding the gpu. Try adding a
modprobe nvidia
at the end of your script, maybe with a sleep 1 (or 2) before and after it.

user91867 · December 26, 2021, 6:49pm

Ok, my simple script.

#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:00\:01.0/remove
echo 1 > /sys/bus/pci/rescan
echo 1 > /sys/bus/pci/devices/0000\:00\:01.0/remove
echo 1 > /sys/bus/pci/rescan
sleep 1
modprobe nvidia
sleep 1

It seems to work.
nvidia-bug-report.log (2.0 MB)

xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x5b cap: 0x9, Source Output, Sink Offload crtcs: 6 outputs: 6 associated providers: 1 name:AMD Radeon HD 7800 Series @ pci:0000:06:00.0
Provider 1: id: 0xab cap: 0x6, Sink Output, Source Offload crtcs: 6 outputs: 6 associated providers: 1 name:AMD Radeon HD 7800 Series @ pci:0000:02:00.0

A question. The Mac Pro has two AMD and one nvidia. Should this be noted with the xorg.conf file?
Is it possible, that I need a third “Provider”?
“Provider 3: id: … nvidia…”

Another question. Is it easier to use the nvidia GPU for rendering, with prime-run or better? This already runned with manjaro, but it was very slow.

generix · December 26, 2021, 7:39pm

The driver doesn’t load, so the gpu is not used by Xorg.
Try adding
modprobe -r nvidia
at the beginning of the script.

generix · December 26, 2021, 7:40pm

Also check journalctl -e for why it doesn’t load.

user91867 · December 26, 2021, 8:06pm

Can you show me how this would look like when the kernel was loaded?
Sry, I don’t know what entry I exactly should looking for.

Here the logs
journal.txt (106.8 KB)
nvidia-bug-report.log (2.0 MB)

I also added to the service the entry
After=bolt.service

I thought the bolt.service should be completed, before the script start.

user91867 · December 26, 2021, 8:12pm

It seems that vulkan API works with the nvidia GPU

__NV_PRIME_RENDER_OFFLOAD=1 vkcube    
WARNING: radv is not a conformant Vulkan implementation, testing use only.
WARNING: radv is not a conformant Vulkan implementation, testing use only.

nvidia-smi 
Sun Dec 26 21:11:28 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:19:00.0 Off |                  N/A |
|  0%   36C    P0    N/A /  90W |     10MiB /  4040MiB |     81%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      5139    C+G   vkcube                              7MiB |
+-----------------------------------------------------------------------------+

generix · December 26, 2021, 8:32pm

It’s running on the plain drm device but as you can see, there’s no Xorg process on the gpu since it doesn’t have a dri dev node.
from dmesg:
initial boot, nvidia gpu doesn’t work:

[    0.549169] pci 0000:19:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[    0.549171] pci 0000:19:00.0: BAR 1: trying firmware assignment [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.549172] pci 0000:19:00.0: BAR 1: [mem 0xc0000000-0xcfffffff 64bit pref] conflicts with PCI Bus 0000:00 [mem 0x80000000-0xdfffffff window]
[    0.549174] pci 0000:19:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[    0.549176] pci 0000:19:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[    0.549178] pci 0000:19:00.0: BAR 3: trying firmware assignment [mem 0xd0000000-0xd1ffffff 64bit pref]
[    0.549179] pci 0000:19:00.0: BAR 3: [mem 0xd0000000-0xd1ffffff 64bit pref] conflicts with PCI Bus 0000:00 [mem 0x80000000-0xdfffffff window]
[    0.549181] pci 0000:19:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[    0.549183] pci 0000:19:00.0: BAR 0: assigned [mem 0xa1000000-0xa1ffffff]
[    0.549190] pci 0000:19:00.1: BAR 0: assigned [mem 0xa2000000-0xa2003fff]
[    0.549197] pci 0000:19:00.0: BAR 5: no space for [io  size 0x0080]
[    0.549198] pci 0000:19:00.0: BAR 5: failed to assign [io  size 0x0080]

then the nvidia driver loads on the defunct device and fails to create /dev/dri/card0

[    2.357771] nvidia: loading out-of-tree module taints kernel.
[    2.357783] nvidia: module license 'NVIDIA' taints kernel.
[    2.567434] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.86  Tue Oct 26 21:46:51 UTC 2021
[    2.571815] [drm] [nvidia-drm] [GPU ID 0x00001900] Loading driver
[    2.576558] NVRM: GPU 0000:19:00.0: RmInitAdapter failed! (0x22:0xffff:667)
[    2.576627] NVRM: GPU 0000:19:00.0: rm_init_adapter failed, device minor number 0
[    2.576766] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00001900] Failed to allocate NvKmsKapiDevice
[    2.576984] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00001900] Failed to register device

Then the device is removed, readded and the gpu is working

[   10.890204] pci 0000:19:00.0: BAR 1: assigned [mem 0xb0000000-0xbfffffff 64bit pref]
[   10.890225] pci 0000:19:00.0: BAR 3: assigned [mem 0xa8000000-0xa9ffffff 64bit pref]
[   10.890245] pci 0000:19:00.0: BAR 0: assigned [mem 0xa1000000-0xa1ffffff]
[   10.890252] pci 0000:19:00.0: BAR 6: assigned [mem 0xa2000000-0xa207ffff pref]
[   10.890254] pci 0000:19:00.1: BAR 0: assigned [mem 0xa2080000-0xa2083fff]
[   10.890261] pci 0000:19:00.0: BAR 5: assigned [io  0x5000-0x507f]

but all of this while the nvidia driver is still loaded so the missig dri dev node is not recreated.
You’ll have to make sure the driver is unloaded and reloaded after the pci device is working so the dri node is correctly created for Xorg. Since this is happening after amdgpu loading, should be /dev/dri/card2 then.

Topic		Replies	Views
NVIDIA driver is not loaded. Ubuntu 18.10 Linux	310	130457	February 14, 2024
Ubuntu 19.04 Driver Installed but not Used Linux	102	16262	October 12, 2021
Nvidia-smi "No devices were found" Linux kernel , ubuntu , driver	8	1486	July 23, 2024
MacBookPro+ Razer CoreX with Geforce RTX2080 super doesnt work in Ubuntu 18.04 Linux tensorflow	35	1175	June 26, 2021
Llvmpipe is used instead of NVIDIA GPU. nvidia-settings not working and cannot switch to NVIDIA GPU Linux	32	50124	March 3, 2022
Driver for RTX3070 not working under Elementary OS on MacBook Pro with eGPU Linux	5	2432	January 18, 2021
Driver Install Successful but nvidia-smi Finds No Devices Linux ubuntu	9	3755	March 18, 2021
Unable to suspend with eGPU, 3060, Wayland and Ubuntu 22.04 Linux	12	983	February 16, 2024
Linux nvidia "GPU screens are not yet supported " Linux hw , kernel	7	5459	October 12, 2021
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. Linux	125	85237	August 5, 2024

Nvidia-smi "No device where found"

Related topics