A2 fails to load linux vgpu driver, NVRM: nvlink memory address conflict

To start, I migrated from VMware A2 vgpu(which worked fine) to a Linux-based solution. I’m having trouble getting NVIDIA-Linux-x86_64-535.154.02-vgpu-kvm.run host driver to load. I’ve enrolled the keys with mokutil and cleared the “Loading of module with unavailable key is rejected.” I’m now showing what I can best describe as a race condition in this driver where “nvidia-vgpu-vfio” fails to load as ‘nvlink’ is unable to reserve a memory address. I have blacklisted the ‘nouveau’ driver. Here’s some information on this:

dmesg

[ 1095.663320] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[ 1095.663327] NVRM: request_mem_region failed for 16M @ 0x81000000. This can
NVRM: occur when a driver such as rivatv is loaded and claims
NVRM: ownership of the device’s registers.
[ 1095.664384] nvidia: probe of 0000:02:00.0 failed with error -1
[ 1095.664400] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1095.664401] NVRM: None of the NVIDIA devices were initialized.
[ 1095.664563] nvidia-nvlink: Unregistered Nvlink Core, major device number 235

cat /proc/iomem

00000000-00000fff : Reserved
00001000-0009ffff : System RAM
000a0000-000fffff : Reserved
000a0000-000bffff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-6b98401f : System RAM
6b984020-6b9b885f : System RAM
6b9b8860-6b9b901f : System RAM
6b9b9020-6b9ed85f : System RAM
6b9ed860-6b9ee01f : System RAM
6b9ee020-6ba1c25f : System RAM
6ba1c260-6c9cf01f : System RAM
6c9cf020-6c9d705f : System RAM
6c9d7060-7057cfff : System RAM
7057d000-7457cfff : Reserved
7457d000-7a6fefff : System RAM
7a6ff000-7bbfefff : Reserved
7b3d0020-7b3d006f : APEI ERST
7b3d0078-7b3d007f : APEI ERST
7b3d0080-7b3d201f : APEI ERST
7bbff000-7bcfefff : ACPI Non-volatile Storage
7bcff000-7befefff : ACPI Tables
7beff000-7befffff : System RAM
7bf00000-7fffffff : Reserved
80000000-dfffffff : PCI Bus 0000:00
80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff]
80000000-800fffff : PCI Bus 0000:07
80000000-8003ffff : 0000:07:00.0
80040000-8007ffff : 0000:07:00.1
80100000-80100fff : 0000:00:1f.5
81000000-82ffffff : PCI Bus 0000:02
81000000-81ffffff : 0000:02:00.0
82000000-823fffff : 0000:02:00.0
90000000-90ffffff : PCI Bus 0000:05
90000000-90ffffff : PCI Bus 0000:06
90000000-90ffffff : 0000:06:00.0
90000000-90ffffff : mgadrmfb_vram
92000000-928fffff : PCI Bus 0000:05
92000000-928fffff : PCI Bus 0000:06
92000000-927fffff : 0000:06:00.0
92808000-9280bfff : 0000:06:00.0
92808000-9280bfff : mgadrmfb_mmio
92a00000-92dfffff : PCI Bus 0000:03
92e00000-92efffff : PCI Bus 0000:01
92e00000-92efffff : 0000:01:00.0
93000000-93001fff : 0000:00:17.0
93000000-93001fff : ahci
93003000-930037ff : 0000:00:17.0
93003000-930037ff : ahci
93004000-930040ff : 0000:00:17.0
93004000-930040ff : ahci
fd690000-fd69ffff : INT34C6:00
fd690000-fd69ffff : INT34C6:00 INT34C6:00
fd6a0000-fd6affff : INT34C6:00
fd6a0000-fd6affff : INT34C6:00 INT34C6:00
fd6b0000-fd6bffff : INT34C6:00
fd6b0000-fd6bffff : INT34C6:00 INT34C6:00
fd6d0000-fd6dffff : INT34C6:00
fd6d0000-fd6dffff : INT34C6:00 INT34C6:00
fd6e0000-fd6effff : INT34C6:00
fd6e0000-fd6effff : INT34C6:00 INT34C6:00
fe010000-fe010fff : Reserved
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed20000-fed7ffff : Reserved
fed40000-fed44fff : MSFT0101:00
fed40000-fed44fff : MSFT0101:00 MSFT0101:00
fed90000-fed90fff : dmar0
fee00000-fee00fff : Local APIC
100000000-107fffffff : System RAM
36b200000-36c5fffff : Kernel code
36c600000-36d276fff : Kernel rodata
36d400000-36d77ff3f : Kernel data
36dc34000-36edfffff : Kernel bss
4000000000-7fffffffff : PCI Bus 0000:00
4402000000-44021fffff : PCI Bus 0000:01
4402000000-44020fffff : 0000:01:00.0
4402100000-44021fffff : 0000:01:00.0
4402100000-44021fffff : megasas: LSI
4402200000-44022fffff : PCI Bus 0000:07
4402200000-440220ffff : 0000:07:00.1
4402200000-440220ffff : tg3
4402210000-440221ffff : 0000:07:00.1
4402210000-440221ffff : tg3
4402220000-440222ffff : 0000:07:00.1
4402220000-440222ffff : tg3
4402230000-440223ffff : 0000:07:00.0
4402230000-440223ffff : tg3
4402240000-440224ffff : 0000:07:00.0
4402240000-440224ffff : tg3
4402250000-440225ffff : 0000:07:00.0
4402250000-440225ffff : tg3
4402300000-440230ffff : 0000:00:14.0
4402300000-440230ffff : xhci-hcd
4402310000-4402313fff : 0000:00:14.2
4402314000-44023140ff : 0000:00:1f.4
4402315000-4402315fff : 0000:00:16.4
4402316000-4402316fff : 0000:00:16.0
4402317000-4402317fff : 0000:00:14.2
4600000000-53ffffffff : PCI Bus 0000:02
4600000000-4601ffffff : 0000:02:00.0
4602000000-4621ffffff : 0000:02:00.0
4800000000-4bffffffff : 0000:02:00.0
4c00000000-53ffffffff : 0000:02:00.0

Is solved in this article
https://support.hpe.com/hpesc/public/docDisplay?docId=a00112218en_us&docLocale=en_US

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.