Driver for RTX3070 not working under Elementary OS on MacBook Pro with eGPU

Dear Nvidia-team,

I have a MacbookPro 2015 with ElementaryOS on an thunderbolt jetdrive and a Razor Core X with RTX3070 installed on another thunderbolt port. I want to use a 5k Monitor via DP->thunderbolt cable. I tried a lot to install the latest NVIDIA driver 455 using the NVIDIA installer script, apt, and also Ubuntu ‘additional driver’ software. Still the driver doesn’t work.

After reboot when the eGPU is connected from the beginning to the thunderbolt port the internal screen stays black. When booting without the eGPU connected, I can boot normally. When connecting the eGPU I get:

dmesg | grep -i nvidia
[ 36.687431] nvidiafb 0000:3c:00.0: enabling device (0000 → 0003)
[ 36.687524] nvidiafb: Device ID: 10de2484
[ 36.687525] nvidiafb: unknown NV_ARCH
[ 36.978490] nvidia: loading out-of-tree module taints kernel.
[ 36.978527] nvidia: module license ‘NVIDIA’ taints kernel.
[ 36.986017] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 36.992981] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 36.993567] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
[ 36.993710] nvidia: probe of 0000:3c:00.0 failed with error -1
[ 36.993724] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 36.993725] NVRM: None of the NVIDIA devices were initialized.
[ 36.994012] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236
[ 37.362603] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 37.363441] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
[ 37.363581] nvidia: probe of 0000:3c:00.0 failed with error -1

and
lspci -vv | grep -i -A11 nvidia
3c:00.0 VGA compatible controller: NVIDIA Corporation Device 2484 (rev a1) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Device 404c
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 19
Region 0: Memory at a5000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at (64-bit, prefetchable)
Region 3: Memory at (64-bit, prefetchable)
Region 5: I/O ports at 5000 [size=128]
[virtual] Expansion ROM at a6000000 [disabled] [size=512K]
Capabilities:
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

3c:00.1 Audio device: NVIDIA Corporation Device 228b (rev a1)
Subsystem: Gigabyte Technology Co., Ltd Device 404c
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 16
Region 0: Memory at a6080000 (32-bit, non-prefetchable) [size=16K]
Capabilities:
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

I have followed some webpages to get rid of the nouveau, but without success.
Hopefully the bug report helps you understand what is the problem.

All the best,

Florian

nvidia-bug-report.log.gz (607.6 KB)

nvidia-installer.log (28.7 KB)

You’ll have to blacklist nvidiafb, it’s blocking the nvidia driver from working.

Thank you for your quick response. I blacklisted nvidiafb and (tried to) purged everything nvidia related and rerun the NVIDIA-Linux-x86_64-455.45.01.run. Same problem.
Again I attached the installer.log and the bug report.

‘lspci -vv | grep -i -A11 nvidia’ gives
3c:00.0 VGA compatible controller: NVIDIA Corporation Device 2484 (rev a1) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Device 404c
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 19
Region 0: Memory at a5000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at (64-bit, prefetchable)
Region 3: Memory at (64-bit, prefetchable)
Region 5: I/O ports at 5000 [size=128]
[virtual] Expansion ROM at a6000000 [disabled] [size=512K]
Capabilities:
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

3c:00.1 Audio device: NVIDIA Corporation Device 228b (rev a1)
Subsystem: Gigabyte Technology Co., Ltd Device 404c
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 16
Region 0: Memory at a6080000 (32-bit, non-prefetchable) [size=16K]
Capabilities:
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

Somehow whatever I try to blacklist, somehow I can’t get rid of nvidiafb and nouveau.
In /etc/modprobe.d I have :
blacklist-framebuffer.conf
'# Framebuffer drivers are generally buggy and poorly-supported, and cause
'# suspend failures, kernel panics and general mayhem. For this reason we
'# never load them automatically.
blacklist aty128fb
blacklist atyfb
blacklist radeonfb
blacklist cirrusfb
blacklist cyber2000fb
blacklist cyblafb
blacklist gx1fb
blacklist hgafb
blacklist i810fb
blacklist intelfb
blacklist kyrofb
blacklist lxfb
blacklist matroxfb_base
blacklist neofb
blacklist nvidiafb
blacklist pm2fb
blacklist rivafb
blacklist s1d13xxxfb
blacklist savagefb
blacklist sisfb
blacklist sstfb
blacklist tdfxfb
blacklist tridentfb
blacklist vesafb
blacklist vfb
blacklist viafb
blacklist vt8623fb
blacklist udlfb

blacklist-nouveau.conf:
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

blacklist-nvidia-nouveau.conf:
blacklist nouveau
options nouveau modeset=0

nouveau-kms.conf :
options nouveau modeset=0

in /lib/modprobe.d/ I have
fbdev-blacklist.conf:
'# This file blacklists most old-style PCI framebuffer drivers.

blacklist arkfb
blacklist aty128fb
blacklist atyfb
blacklist radeonfb
blacklist cirrusfb
blacklist cyber2000fb
blacklist kyrofb
blacklist matroxfb_base
blacklist mb862xxfb
blacklist neofb
blacklist pm2fb
blacklist pm3fb
blacklist s3fb
blacklist savagefb
blacklist sisfb
blacklist tdfxfb
blacklist tridentfb
blacklist vt8623fb
blacklist nvidiafb

and nvidia-graphics-drivers.conf:
blacklist nouveau
blacklist lbm-nouveau

and have run ‘sudo update-initramfs -u’ several times.

What could I do?
All the best,

Florian

nvidia-bug-report.log.gz (555.5 KB)
nvidia-installer.log (28.9 KB)

Now you’re running into

NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
                                                 NVRM: BAR1 is 0M @ 0x0 (PCI:0000:3c:00.0)

which is very common on Apple hardware. Please try using the kernel parameter
pci=realloc
If that doesn’t help, you’ll need to remove and add the gpu again before loading the driver, see this for some hints:
https://github.com/Dunedan/mbp-2016-linux/issues/60

Dear generix,

I tried pci=realloc and it didn’t make a difference. Then I also tried several other kernel params, no succes. I continued to try everything I found on the internet. No change. I finally thought it was a firmware problem of my MacBook Pro and simply stopped.
Then, today I again booted into ElementaryOS because I wanted to do some things without using the eGPU. I realized that there were updates available for the NVIDIA packages. I updated and reinstalled the nvidia-driver-460 package and the strange ’ This PCI I/O region assigned to your NVIDIA device is invalid’ error was gone. Strangely, nvidia-smi does not find devices?!?

Here is some system info:
sudo lspci -vv | grep -i -A11 nvidia
3c:00.0 VGA compatible controller: NVIDIA Corporation Device 2484 (rev a1) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Device 404c
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 19
Region 0: Memory at a6000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at (64-bit, prefetchable)
Region 3: Memory at (64-bit, prefetchable)
Region 5: I/O ports at 7000 [size=128]
[virtual] Expansion ROM at a5800000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3

Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

lsmod | grep -i nvidia
nvidia_uvm 983040 0
nvidia_drm 57344 0
nvidia_modeset 1224704 1 nvidia_drm
nvidia 34037760 2 nvidia_uvm,nvidia_modeset
drm_kms_helper 184320 2 nvidia_drm,i915
drm 491520 7 drm_kms_helper,nvidia_drm,i915

dmesg | grep -i -A3 nvidia
[ 25.574940] nvidia: loading out-of-tree module taints kernel.
[ 25.574946] nvidia: module license ‘NVIDIA’ taints kernel.
[ 25.574947] Disabling lock debugging due to kernel taint
[ 25.583100] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 25.591524] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 25.592955] nvidia 0000:3c:00.0: enabling device (0000 → 0003)
[ 25.593095] nvidia 0000:3c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 25.639632] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 460.32.03 Sun Dec 27 19:00:34 UTC 2020
[ 25.656573] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 460.32.03 Sun Dec 27 18:51:11 UTC 2020
[ 25.660437] [drm] [nvidia-drm] [GPU ID 0x00003c00] Loading driver
[ 25.660534] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:3c:00.0 on minor 1
[ 25.682368] nvidia-uvm: Loaded the UVM driver, major device number 234.
[ 26.936774] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:05:00.0/0000:06:05.0/0000:3a:00.0/0000:3b:01.0/0000:3c:00.1/sound/card2/input17
[ 26.937067] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:05:00.0/0000:06:05.0/0000:3a:00.0/0000:3b:01.0/0000:3c:00.1/sound/card2/input18
[ 26.937352] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:05:00.0/0000:06:05.0/0000:3a:00.0/0000:3b:01.0/0000:3c:00.1/sound/card2/input19
[ 26.937621] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:05:00.0/0000:06:05.0/0000:3a:00.0/0000:3b:01.0/0000:3c:00.1/sound/card2/input20
[ 26.937917] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:05:00.0/0000:06:05.0/0000:3a:00.0/0000:3b:01.0/0000:3c:00.1/sound/card2/input21
[ 26.938222] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:01.1/0000:05:00.0/0000:06:05.0/0000:3a:00.0/0000:3b:01.0/0000:3c:00.1/sound/card2/input22
[ 26.938592] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:01.1/0000:05:00.0/0000:06:05.0/0000:3a:00.0/0000:3b:01.0/0000:3c:00.1/sound/card2/input23
[ 32.332684] NVRM: GPU 0000:3c:00.0: RmInitAdapter failed! (0x23:0xffff:624)
[ 32.332753] NVRM: GPU 0000:3c:00.0: rm_init_adapter failed, device minor number 0
[ 32.335726] NVRM: GPU 0000:3c:00.0: RmInitAdapter failed! (0x23:0xffff:624)

sudo lshw -c video
[sudo] password for dadatenberger:
*-display
description: VGA compatible controller
product: NVIDIA Corporation
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:3c:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:19 memory:a6000000-a6ffffff ioport:7000(size=128) memory:a5800000-a587ffff
*-display
description: VGA compatible controller
product: Crystal Well Integrated Graphics Controller
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 08
width: 64 bits
clock: 33MHz
capabilities: msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:58 memory:a0000000-a03fffff memory:90000000-9fffffff ioport:3000(size=64) memory:c0000-dffff

The bug report is attached. What could I do now?

all the best,

Florian

nvidia-bug-report.log.gz (119.9 KB)

The error is still the same, the nvidia driver just doesn’t display the warning anymore. Please try this:

  1. Blacklist the nvidia driver
    /etc/modprobe.d/nvidia-blacklist.conf
blacklist nvidia
  1. Update initrd
    sudo update-initramfs -u
  2. Disable display-manager
    sudo systemctl disable display-manager
  3. Reboot
  4. log in on text console, make sure the nvidia driver is not loaded
    lsmod |grep nvidia
  5. get a root console, remove and add back the pci bridge
    sudo -s
    echo 1 > /sys/bus/pci/devices/0000:00:01.1/remove
    echo 1 > /sys/bus/pci/rescan
  6. Try loading the nvidia driver
    modprobe nvidia
  7. create a new nvidia-bug-report.log
  8. start lightdm
    systemctl start lightdm