Hi I have an exotic configuration.
A Mac pro 2013 with two intern AMD GPUs.
Because of the Thunderbolt 2 Interfaces and an available GeForce GTX 1050 Ti I thought I expand the Mac with an eGPU Card.
The eGPU seems to work, it is authorized. The GPU was found by Linux. But nvidia-smi only says “No device where found”.
I tried something in the meantime.
Tried different boot parameters, different xorg configs and different nvidia-driver Versions (470 and 495), Ubuntu and Manjaro
With 495 I got " NVRM: BAR1 is 0M @ 0x0 (PCI:0000:19:00.0)"
Actual I use the driver V470.86
disable display-manager
4 reboot.
The steps above were the same as in the other posting.
At 5 I am not sure, if the blacklist worked.
First I needed an ssh access, because with
“echo 1 > sys/bus/pci/devices/000:00:01.0/remove” my keyboard and mouse was gone.
per ssh, I have to enter the command twice. After second time, the rescan command worked.
Ok, it seems to be working now, but what are the next steps? How I get this without step 6?
The nvidia x server settings shows me the GPU on Demand. What’s exactly the difference between “on Demand” and “Performance Mode”? How does prime-select and/or prime-query exactly works?
Any suggestions?
Step 6 was just for debugging so i could see errors in case of failure.
Next step would be creating a systemd unit and a script to have this run automatically on system boot. e.g.
after many tries with “display-manager.service”, “graphical-target” Before, WantedBy, etc… I created a cronjob which start the script at boot. That seems to work. I don’t know why, but the systemd service file didn’t worked.
Now my last Problem. Monitor is connected to egpu, but no display.
/etc/X11/xorg.conf.d/10-nvidia-egpu.conf is created, but that has no effect.
How I get the Screen over the 1050? How I can switch to the 1050?
This looks like the driver is already loaded when the script removes/readds the bus, so it gets removed. Furthermore, it’s doing it too late, the Xserver has already started when the nvidia gpu comes alive.
X start after 10.5s
nvidia gpu ready after 13.2s
Loading "/home/michael/.Valley/valley_1.0.cfg"...
Loading "libGPUMonitor_x64.so"...
Loading "libGL.so.1"...
Loading "libopenal.so.1"...
Set 2560x1440 fullscreen video mode
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 152 (GLX)
Minor opcode of failed request: 5 (X_GLXMakeCurrent)
Serial number of failed request: 0
Current serial number in output stream: 59
AL lib: (EE) alc_cleanup: 1 device not closed
Just take a look at the timestamps in dmesg and xorg logs, then you see when things happen.
The timing seems to be correct now but the driver isn’t loaded after re-adding the gpu. Try adding a
modprobe nvidia
at the end of your script, maybe with a sleep 1 (or 2) before and after it.
A question. The Mac Pro has two AMD and one nvidia. Should this be noted with the xorg.conf file?
Is it possible, that I need a third “Provider”?
“Provider 3: id: … nvidia…”
Another question. Is it easier to use the nvidia GPU for rendering, with prime-run or better? This already runned with manjaro, but it was very slow.
It seems that vulkan API works with the nvidia GPU
__NV_PRIME_RENDER_OFFLOAD=1 vkcube
WARNING: radv is not a conformant Vulkan implementation, testing use only.
WARNING: radv is not a conformant Vulkan implementation, testing use only.
nvidia-smi
Sun Dec 26 21:11:28 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:19:00.0 Off | N/A |
| 0% 36C P0 N/A / 90W | 10MiB / 4040MiB | 81% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 5139 C+G vkcube 7MiB |
+-----------------------------------------------------------------------------+
It’s running on the plain drm device but as you can see, there’s no Xorg process on the gpu since it doesn’t have a dri dev node.
from dmesg:
initial boot, nvidia gpu doesn’t work:
[ 0.549169] pci 0000:19:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[ 0.549171] pci 0000:19:00.0: BAR 1: trying firmware assignment [mem 0xc0000000-0xcfffffff 64bit pref]
[ 0.549172] pci 0000:19:00.0: BAR 1: [mem 0xc0000000-0xcfffffff 64bit pref] conflicts with PCI Bus 0000:00 [mem 0x80000000-0xdfffffff window]
[ 0.549174] pci 0000:19:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[ 0.549176] pci 0000:19:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[ 0.549178] pci 0000:19:00.0: BAR 3: trying firmware assignment [mem 0xd0000000-0xd1ffffff 64bit pref]
[ 0.549179] pci 0000:19:00.0: BAR 3: [mem 0xd0000000-0xd1ffffff 64bit pref] conflicts with PCI Bus 0000:00 [mem 0x80000000-0xdfffffff window]
[ 0.549181] pci 0000:19:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[ 0.549183] pci 0000:19:00.0: BAR 0: assigned [mem 0xa1000000-0xa1ffffff]
[ 0.549190] pci 0000:19:00.1: BAR 0: assigned [mem 0xa2000000-0xa2003fff]
[ 0.549197] pci 0000:19:00.0: BAR 5: no space for [io size 0x0080]
[ 0.549198] pci 0000:19:00.0: BAR 5: failed to assign [io size 0x0080]
then the nvidia driver loads on the defunct device and fails to create /dev/dri/card0
[ 2.357771] nvidia: loading out-of-tree module taints kernel.
[ 2.357783] nvidia: module license 'NVIDIA' taints kernel.
[ 2.567434] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 470.86 Tue Oct 26 21:46:51 UTC 2021
[ 2.571815] [drm] [nvidia-drm] [GPU ID 0x00001900] Loading driver
[ 2.576558] NVRM: GPU 0000:19:00.0: RmInitAdapter failed! (0x22:0xffff:667)
[ 2.576627] NVRM: GPU 0000:19:00.0: rm_init_adapter failed, device minor number 0
[ 2.576766] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00001900] Failed to allocate NvKmsKapiDevice
[ 2.576984] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00001900] Failed to register device
Then the device is removed, readded and the gpu is working
[ 10.890204] pci 0000:19:00.0: BAR 1: assigned [mem 0xb0000000-0xbfffffff 64bit pref]
[ 10.890225] pci 0000:19:00.0: BAR 3: assigned [mem 0xa8000000-0xa9ffffff 64bit pref]
[ 10.890245] pci 0000:19:00.0: BAR 0: assigned [mem 0xa1000000-0xa1ffffff]
[ 10.890252] pci 0000:19:00.0: BAR 6: assigned [mem 0xa2000000-0xa207ffff pref]
[ 10.890254] pci 0000:19:00.1: BAR 0: assigned [mem 0xa2080000-0xa2083fff]
[ 10.890261] pci 0000:19:00.0: BAR 5: assigned [io 0x5000-0x507f]
but all of this while the nvidia driver is still loaded so the missig dri dev node is not recreated.
You’ll have to make sure the driver is unloaded and reloaded after the pci device is working so the dri node is correctly created for Xorg. Since this is happening after amdgpu loading, should be /dev/dri/card2 then.