eGpu 2070 run NVIDIA 510.54 driver NVRM: failed to copy vbios to system memory

I’m using eGpu on the Archlinux. The kernel version is 5.16.11.
Trying install nvidia from pacman or nvidia-dkms from yay, and then the driver not work, can’t recognize my 2070.

lspci | grep -E ‘VGA|3D’

00:02.0 VGA compatible controller: Intel Corporation Iris Plus Graphics G4 (Ice Lake) (rev 07)
03:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2070 Rev. A] (rev a1)

dmesg | grep PCIe

[ 0.099480] ACPI FADT declares the system doesn’t support PCIe ASPM, so disable it
[ 0.646804] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR DPC]
[ 0.660239] pci 0000:01:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.0 (cap
able of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[ 0.669884] pci 0000:03:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.0 (cap
able of 126.016 Gb/s with 8.0 GT/s PCIe x16 link)
[ 0.678966] pci 0000:04:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.0 (cap
able of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)

lspci -k

00:00.0 Host bridge: Intel Corporation Ice Lake-LP Processor Host Bridge/DRAM Registers (rev 03)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: icl_uncore
00:02.0 VGA compatible controller: Intel Corporation Iris Plus Graphics G4 (Ice Lake) (rev 07)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: i915
Kernel modules: i915
00:04.0 Signal processing controller: Intel Corporation Device 8a03 (rev 03)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: proc_thermal
Kernel modules: processor_thermal_device_pci_legacy
00:07.0 PCI bridge: Intel Corporation Ice Lake Thunderbolt 3 PCI Express Root Port #0 (rev 03)
Kernel driver in use: pcieport
00:07.1 PCI bridge: Intel Corporation Ice Lake Thunderbolt 3 PCI Express Root Port #1 (rev 03)
Kernel driver in use: pcieport
00:07.2 PCI bridge: Intel Corporation Ice Lake Thunderbolt 3 PCI Express Root Port #2 (rev 03)
Kernel driver in use: pcieport
00:07.3 PCI bridge: Intel Corporation Ice Lake Thunderbolt 3 PCI Express Root Port #3 (rev 03)
Kernel driver in use: pcieport
00:08.0 System peripheral: Intel Corporation Device 8a11 (rev 03)
Subsystem: Acer Incorporated [ALI] Device 141f
00:0d.0 USB controller: Intel Corporation Ice Lake Thunderbolt 3 USB Controller (rev 03)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00:0d.2 System peripheral: Intel Corporation Ice Lake Thunderbolt 3 NHI #0 (rev 03)
Kernel driver in use: thunderbolt
Kernel modules: thunderbolt
00:0d.3 System peripheral: Intel Corporation Ice Lake Thunderbolt 3 NHI #1 (rev 03)
Kernel driver in use: thunderbolt
Kernel modules: thunderbolt
00:14.0 USB controller: Intel Corporation Ice Lake-LP USB 3.1 xHCI Host Controller (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00:14.2 RAM memory: Intel Corporation Ice Lake-LP DRAM Controller (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
00:14.3 Network controller: Intel Corporation Ice Lake-LP PCH CNVi WiFi (rev 30)
Subsystem: Intel Corporation Wi-Fi 6 AX201
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
00:15.0 Serial bus controller: Intel Corporation Ice Lake-LP Serial IO I2C Controller #0 (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: intel-lpss
Kernel modules: intel_lpss_pci
00:15.1 Serial bus controller: Intel Corporation Ice Lake-LP Serial IO I2C Controller #1 (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: intel-lpss
Kernel modules: intel_lpss_pci
00:15.2 Serial bus controller: Intel Corporation Ice Lake-LP Serial IO I2C Controller #2 (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: intel-lpss
Kernel modules: intel_lpss_pci
00:16.0 Communication controller: Intel Corporation Ice Lake-LP Management Engine (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: mei_me
Kernel modules: mei_me
00:1d.0 PCI bridge: Intel Corporation Device 34b4 (rev 30)
Kernel driver in use: pcieport
00:1f.0 ISA bridge: Intel Corporation Ice Lake-LP LPC Controller (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
00:1f.3 Multimedia audio controller: Intel Corporation Ice Lake-LP Smart Sound Technology Audio Controller (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: sof-audio-pci-intel-icl
Kernel modules: snd_hda_intel, snd_sof_pci_intel_icl
00:1f.5 Serial bus controller: Intel Corporation Ice Lake-LP SPI Controller (rev 30)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: intel-spi
Kernel modules: intel_spi_pci
01:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
Kernel driver in use: pcieport
02:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
Kernel driver in use: pcieport
02:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
Kernel driver in use: pcieport
03:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2070 Rev. A] (rev a1)
Subsystem: NVIDIA Corporation Device 12ff
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
03:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
Subsystem: Acer Incorporated [ALI] Device 141f
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
03:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1)
Subsystem: NVIDIA Corporation Device 12ff
Kernel modules: xhci_pci
03:00.3 Serial bus controller: NVIDIA Corporation TU106 USB Type-C UCSI Controller (rev a1)
Subsystem: NVIDIA Corporation Device 12ff
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu
04:00.0 USB controller: Intel Corporation DSL6540 USB 3.1 Controller [Alpine Ridge]
Subsystem: Tul Corporation / PowerColor Device 5005
Kernel modules: xhci_pci
ab:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01)
Subsystem: Sandisk Corp WD Blue SN550 NVMe SSD
Kernel driver in use: nvme

dmesg | grep NV

[ 2.343559] nvidia: module license ‘NVIDIA’ taints kernel.
[ 2.625912] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 510.54 Tue Feb 8 04:42:21 UTC 2022
[ 2.713531] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 510.54 Tue Feb 8 04:34:06 UTC 2022
[ 5.170492] NVRM: 0000:03:00.0: Failed to create a DMA mapping!
[ 5.170651] NVRM: GPU 0000:03:00.0: Failed to copy vbios to system memory.
[ 5.170831] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x30:0xffff:963)
[ 5.170878] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0
[ 5.474190] NVRM: 0000:03:00.0: Failed to create a DMA mapping!
[ 5.474345] NVRM: GPU 0000:03:00.0: Failed to copy vbios to system memory.
[ 5.474513] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x30:0xffff:963)
[ 5.474623] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0
[ 5.775882] NVRM: 0000:03:00.0: Failed to create a DMA mapping!
[ 5.776040] NVRM: GPU 0000:03:00.0: Failed to copy vbios to system memory.
[ 5.776183] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x30:0xffff:963)
[ 5.776314] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0
[ 6.089688] NVRM: 0000:03:00.0: Failed to create a DMA mapping!
[ 6.089846] NVRM: GPU 0000:03:00.0: Failed to copy vbios to system memory.
[ 6.090018] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x30:0xffff:963)
[ 6.090125] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0

cat /proc/driver/nvidia/gpus/*/information

Model: Unknown
IRQ: 189
GPU UUID: GPU-???-???-???-???-???
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:03:00.0
Device Minor: 0
GPU Excluded: No

Here is bug-report

nvidia-bug-report.log.gz (112.5 KB)

sounds like the device is untrusted in thunderbolt setup.

Hey,I check my thunderbolt setup using boltctl. When I send this topic, thunderbolt device was trusted by system automaticly.

But when I remake the setup and trust the device, also I change the global.auth-mode in boltctl, then I reboot it.

boltctl forget 00e590d8-3b05-6801-ffff-ffffffffffff
boltctl enroll --policy=auto 00e590d8-3b05-6801-ffff-ffffffffffff
boltctl config global.auth-mode disabled

Nothing have change

dmesg | grep NV

[ 0.000000] BIOS-e820: [mem 0x0000000033a5f000-0x00000000347cefff] ACPI NVS
[ 0.099486] ACPI: PM: Registering ACPI NVS region [mem 0x33a5f000-0x347cefff] (14090240 bytes)
[ 0.102896] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 9.494877] nvidia: module license ‘NVIDIA’ taints kernel.
[ 9.631063] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 510.54 Tue Feb 8 04:42:21 UTC 2022
[ 10.119002] NVRM: 0000:03:00.0: Failed to create a DMA mapping!
[ 10.119171] NVRM: GPU 0000:03:00.0: Failed to copy vbios to system memory.
[ 10.119361] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x30:0xffff:963)
[ 10.119699] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0
[ 10.422107] NVRM: 0000:03:00.0: Failed to create a DMA mapping!
[ 10.422359] NVRM: GPU 0000:03:00.0: Failed to copy vbios to system memory.
[ 10.422540] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x30:0xffff:963)
[ 10.422671] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0
[ 10.723968] NVRM: 0000:03:00.0: Failed to create a DMA mapping!
[ 10.724119] NVRM: GPU 0000:03:00.0: Failed to copy vbios to system memory.
[ 10.724248] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x30:0xffff:963)
[ 10.724353] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0
[ 11.024202] NVRM: 0000:03:00.0: Failed to create a DMA mapping!
[ 11.024360] NVRM: GPU 0000:03:00.0: Failed to copy vbios to system memory.
[ 11.024479] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x30:0xffff:963)
[ 11.024574] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0

journalctl -b -u bolt

Mar 05 12:52:29 ArchWike systemd[1]: Starting Thunderbolt system service…
Mar 05 12:52:29 ArchWike boltd[783]: bolt 0.9.2 starting up.
Mar 05 12:52:29 ArchWike boltd[783]: manager: initializing store
Mar 05 12:52:29 ArchWike boltd[783]: store: located at: /var/lib/boltd
Mar 05 12:52:29 ArchWike boltd[783]: config: loading user config
Mar 05 12:52:29 ArchWike boltd[783]: config: user config loaded successfully
Mar 05 12:52:29 ArchWike boltd[783]: config: auth mode set to ‘disabled’
Mar 05 12:52:29 ArchWike boltd[783]: bouncer: initializing polkit
Mar 05 12:52:29 ArchWike boltd[783]: watchdog: enabled [pulse: 90s]
Mar 05 12:52:29 ArchWike boltd[783]: udev: initializing udev
Mar 05 12:52:29 ArchWike boltd[783]: store: loading domains
Mar 05 12:52:29 ArchWike boltd[783]: store: loading devices
Mar 05 12:52:29 ArchWike boltd[783]: [00e590d8-3b05 ] store: loading device
Mar 05 12:52:29 ArchWike boltd[783]: power: state located at: /run/boltd/power
Mar 05 12:52:29 ArchWike boltd[783]: power: force power support: yes
Mar 05 12:52:29 ArchWike boltd[783]: udev: found 2 domains
Mar 05 12:52:29 ArchWike boltd[783]: udev: enumerating devices
Mar 05 12:52:29 ArchWike boltd[783]: [50c63ff5-7d55-domain0 ] newly connected [iommu] (/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0)
Mar 05 12:52:29 ArchWike boltd[783]: security level set to ‘none’
Mar 05 12:52:29 ArchWike boltd[783]: [50c63ff5-7d55-domain0 ] domain: registered (bootacl: 0/0)
Mar 05 12:52:29 ArchWike boltd[783]: [50c63ff5-7d55-domain0 ] bootacl: bootacl not supported, no sync
Mar 05 12:52:29 ArchWike boltd[783]: [50c63ff5-7d55-domain0 ] udev: uuid is stable: no (for NHI: 0x8a17)
Mar 05 12:52:29 ArchWike boltd[783]: global ‘generation’ set to ‘3’
Mar 05 12:52:29 ArchWike boltd[783]: [50c63ff5-7d55-Spin SP513-54N ] device added, status: authorized, at /sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0
Mar 05 12:52:29 ArchWike boltd[783]: [50c63ff5-7d55-Spin SP513-54N ] labeling device: Acer Spin SP513-54N
Mar 05 12:52:29 ArchWike boltd[783]: [00e590d8-3b05-TBX-550CA ] parent is 50c63ff5-7d55…
Mar 05 12:52:29 ArchWike boltd[783]: [00e590d8-3b05-TBX-550CA ] connected: authorized (/sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1)
Mar 05 12:52:29 ArchWike boltd[783]: [b1897077-8796-domain1 ] newly connected [iommu] (/sys/devices/pci0000:00/0000:00:0d.3/domain1/1-0)
Mar 05 12:52:29 ArchWike boltd[783]: [b1897077-8796-domain1 ] domain: registered (bootacl: 0/0)
Mar 05 12:52:29 ArchWike boltd[783]: [b1897077-8796-domain1 ] bootacl: bootacl not supported, no sync
Mar 05 12:52:29 ArchWike boltd[783]: [b1897077-8796-domain1 ] udev: uuid is stable: no (for NHI: 0x8a0d)
Mar 05 12:52:29 ArchWike boltd[783]: [b1897077-8796-Spin SP513-54N ] device added, status: authorized, at /sys/devices/pci0000:00/0000:00:0d.3/domain1/1-0
Mar 05 12:52:29 ArchWike boltd[783]: [b1897077-8796-Spin SP513-54N ] labeling device: Acer Spin SP513-54N
Mar 05 12:52:29 ArchWike boltd[783]: [50c63ff5-7d55-domain0 ] dbus: exported domain at /org/freedesktop/bolt/domains/50c63ff5_7d55_8680_ffff_ffffffffffff
Mar 05 12:52:29 ArchWike boltd[783]: [b1897077-8796-domain1 ] dbus: exported domain at /org/freedesktop/bolt/domains/b1897077_8796_8680_ffff_ffffffffffff
Mar 05 12:52:29 ArchWike boltd[783]: [00e590d8-3b05-TBX-550CA ] dbus: exported device at /org/freedesktop/bolt/devices/00e590d8_3b05…
Mar 05 12:52:29 ArchWike boltd[783]: [50c63ff5-7d55-Spin SP513-54N ] dbus: exported device at /org/freedesktop/bolt/devices/50c63ff5_7d55…
Mar 05 12:52:29 ArchWike boltd[783]: [b1897077-8796-Spin SP513-54N ] dbus: exported device at /org/freedesktop/bolt/devices/b1897077_8796…
Mar 05 12:52:29 ArchWike systemd[1]: Started Thunderbolt system service.

And I found that IOMMU is enabled in the Thunderbolt domains. Is it related to this?

boltctl domains

● domain0 50c63ff5-7d55-8680-ffff-ffffffffffff
├─ bootacl: 0/0
└─ security: iommu

● domain1 b1897077-8796-8680-ffff-ffffffffffff
├─ bootacl: 0/0
└─ security: iommu

I turn off IOMMU in the bios, the driver is working.

boltctl domains

● domain0 a0dd3de9-f54d-8680-ffff-ffffffffffff
├─ bootacl: 0/0
└─ security: none

● domain1 91959da6-baf6-8680-ffff-ffffffffffff
├─ bootacl: 0/0
└─ security: none

Is there any way only set the domains security and the IOMMU still work fine?

I ended up adding intel_iommu=off to the kernel boot parameters to circumvent this problem.
Now domains security is none, and the driver is working, but i think that will have some security risks.
I check my laptop bios and there is nothing can change the thunderbolt security level, it lock in none.
If you have similar problem, you can check your bios setting firstly and try to change thunderbolt security level.
I’m not a native English speaker, hope you will understand.