We have several systems with CentOS 7.6 installed, and we compiled 4.20.0 kernel on one of them (with Mellanox driver enabled), and packaged to RPMs.
Then we installed this kernel/kernel-devel RPM onto all machines, they can all boot into system successfully, and they all can recognize the mlnx nic card with correct state,
4: enp94s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 50:6b:4b:aa:e0:72 brd ff:ff:ff:ff:ff:ff
inet 10.2.2.25/24 brd 10.2.2.255 scope global noprefixroute enp94s0f0
valid_lft forever preferred_lft forever
inet6 fe80::75f7:872c:b059:a2a/64 scope link noprefixroute
valid_lft forever preferred_lft forever
5: enp94s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 50:6b:4b:aa:e0:73 brd ff:ff:ff:ff:ff:ff
As we’d like to run both RDMA/DPDK on our MLNX nic card, we choose to install it with “./mlnxofedinstall --add-kernel-support --upstream-libs”, compile and installation succeeded but after reboot, we see enp94s0f0/1 disappear.
If we use “ibv_devinfo” to check we can see both ports displayed as down.
Also we see following logs in /var/log/dmesg,
Nov 09 18:29:37 kernel: Compat-mlnx-ofed backport release: 1c4bf42
Nov 09 18:29:37 kernel: Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git 1c4bf42
Nov 09 18:29:37 kernel: compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git
Nov 09 18:29:37 kernel: mlx5_ib: disagrees about version of symbol mlx5_core_create_qp
Nov 09 18:29:37 kernel: mlx5_ib: Unknown symbol mlx5_core_create_qp (err -22)
Nov 09 18:29:37 kernel: mlx5_ib: disagrees about version of symbol mlx5_core_destroy_rq_tracked
Nov 09 18:29:37 kernel: mlx5_ib: Unknown symbol mlx5_core_destroy_rq_tracked (err -22)
Nov 09 18:29:37 kernel: mlx5_ib: disagrees about version of symbol mlx5_eswitch_add_send_to_vport_rule
Nov 09 18:29:37 kernel: mlx5_ib: Unknown symbol mlx5_eswitch_add_send_to_vport_rule (err -22)
Nov 09 18:29:37 kernel: mlx5_ib: disagrees about version of symbol mlx5_modify_header_alloc
Nov 09 18:29:37 kernel: mlx5_ib: Unknown symbol mlx5_modify_header_alloc (err -22)
Nov 09 18:29:37 kernel: mlx5_ib: disagrees about version of symbol mlx5_db_free
Nov 09 18:29:37 kernel: mlx5_ib: Unknown symbol mlx5_db_free (err -22)
Nov 09 18:29:37 systemd-udevd[1341]: Error running install command for mlx5_ib
Nov 09 18:29:37 systemd-udevd[1294]: modprobe: ERROR: could not insert 'mlx5_ib': Invalid argument
Nov 09 18:31:37 root[2240]: openibd: start(): Detected loaded old version of module 'mlx5_core', calling stop...
Nov 09 18:31:37 systemd[1]: rdma.service: main process exited, code=exited, status=1/FAILURE
Nov 09 18:31:37 systemd[1]: Failed to start Initialize the iWARP/InfiniBand/RDMA stack in the kernel.