After upgrading the MLNX_OFED Driver to version 5.8-2.0.3, the following error occurs during boot:
error log
Jul 10 05:41:46 Qacloudhost06 kernel: [2263067.316179] mlx5_vdpa: disagrees about version of symbol mlx5_db_free
Jul 10 05:41:46 Qacloudhost06 kernel: [2263067.316185] mlx5_vdpa: Unknown symbol mlx5_db_free (err -22)
Jul 10 05:41:46 Qacloudhost06 kernel: [2263067.316212] mlx5_vdpa: disagrees about version of symbol mlx5_query_nic_vport_mtu
Jul 10 05:41:46 Qacloudhost06 kernel: [2263067.316213] mlx5_vdpa: Unknown symbol mlx5_query_nic_vport_mtu (err -22)
Jul 10 05:41:46 Qacloudhost06 kernel: [2263067.316227] mlx5_vdpa: disagrees about version of symbol mlx5_create_auto_grouped_flow_table
mlx5_vdap driver information
filename: /lib/modules/5.15.0-60-generic/kernel/drivers/vdpa/mlx5/mlx5_vdpa.ko
license: Dual BSD/GPL
description: Mellanox VDPA driver
author: Eli Cohen <eli@mellanox.com>
srcversion: 7E302F0D222DB0C740AEE6A
alias: auxiliary:mlx5_core.vnet
depends: mlx5_core,vhost_iotlb,vringh,vdpa
retpoline: Y
intree: Y
name: mlx5_vdpa
vermagic: 5.15.0-60-generic SMP mod_unload modversions
sig_id: PKCS#7
signer: Build time autogenerated kernel key
sig_key: 0D:04:40:B9:A2:DE:02:2B:3C:CE:07:73:95:8B:8F:C1:58:B8:F5:D4
sig_hashalgo: sha512
...
Upon checking the version of the mlx5_vdpa driver, it seems to be using the default driver of the 5.15 kernel, and it appears to be a version conflict with the mlx5_core driver.
The mlx5_vdpa driver does not exist on the Nvidia driver download site, so I have set it to not load the mlx5_vdpa driver at boot using a blacklist.
If there is a better solution or method, please let me know.
Here are the details of my setup:
Hardware
- Server: Dell R7615
- CPU: AMD Epyc 9654P
- Memory: 384GB
- NUMA: 1
- NIC: Connect-X 6LX
Software Versions
- OS: Ubuntu 22.04.2 LTS
- Kernel: 5.15
- Openstack Version: Yoga
- OVN: 22.03
- OVS: 2.17.5
- MLNX OFED Driver: 5.8-2.0.3
- Firmware: 26.35.1012 (DEL0000000031)