I’m attempting to passthrough a Mellanox ConnectX-4 NIC to a VM and getting memory errors in dmesg:
...
[32278.078269] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.078271] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.078272] ioremap memtype_reserve failed -16
[32278.082272] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.082275] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.082277] ioremap memtype_reserve failed -16
[32278.086270] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.086273] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.086274] ioremap memtype_reserve failed -16
[32278.090268] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.090270] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.090271] ioremap memtype_reserve failed -16
...
These errors repeat hundreds or thousands of times while the VM is starting. Eventually the VM boots properly, lspci
in the guest shows the NIC, but doesn’t load the driver for it so it’s unusable.
I’m able to passthrough other PCI devices like NVMe SSDs and it works fine with no dmesg errors, it’s specifically the Mellanox NICs that have problems. I’ve tried both ConnectX-4 Lx and a ConnectX-6 card. The passthrough works without errors in other machines, it’s specifically with this AM5 platform that I’m having issues. I’ve tried using SR-IOV and passing through just one virtual function and that also causes the same errors. I’ve also tried the NIC in different PCIe slots and the same thing happens. Each port of the NIC is in it’s own IOMMU group, and I’ve tried passing in each individual port, as well as both ports together, each time getting the same errors
This is with a new MSI X670E ACE motherboard with a 7950X CPU. I’m seeing the issue with kernel versions from 5.15 to 6.2. I tried installing MLNX_EN
on Ubuntu and that driver resolved the issue, so it seems like the error is in the mlx5_core
driver. Unfortunately I’m unable to install MLNX_EN
on my primary OS, Fedora.