We upgraded Mellanox drivers to mlnx-en-4.4-2.0.7.0-rhel7.5-x86_64 but after that ibv_get_device_list function is not working.

System: RHEL 7.5 x86_64 with Mellanox driver

Earlier we had 4.0.2 version driver which worked fine with dpdk 17.11.4. Now when we upgraded Mellanox driver version to 4.4.2 (used mlnx-en-4.4-2.0.7.0-rhel7.5-x86_64.iso), it doesn’t give expected results.

I can see new version of modules get loaded correctly through lsmod command.

mlx_fe-fe-0$ lsmod | grep mlx4_en

mlx4_en 142833 0

ptp 19231 2 mlx4_en,mlx5_core

mlx4_core 352500 1 mlx4_en

mlx_compat 28081 4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core

devlink 42368 3 mlx4_en,mlx4_core,mlx5_core

mlx_fe-fe-0$ lsmod | grep mlx4_core

mlx4_core 352500 1 mlx4_en

mlx_compat 28081 4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core

devlink 42368 3 mlx4_en,mlx4_core,mlx5_core

However with this driver version OFED function ibv_get_device_list doesn’t give correct list of devices.

This function is supposed to return an array of RDMA devices currently available. However it doesn’t give desired output. We have below four devices on our system but this function doesn’t return anything after upgrading the Mellanox drivers to version to mlnx-en-4.4-2.0.7.0-rhel7.5-x86_64

$ lspci | grep Mell

00:06.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

00:07.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

00:08.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

00:09.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

I have few questions with this regard

  1. Only installing driver iso image mlnx-en-4.4-2.0.7.0-rhel7.5-x86_64.iso is sufficient for upgrading the driver or we need to install few other packages also?
  2. What could be the reason of ibv_get_device_list function not returning the correct list of devices - as the code is not available on our system I cannot debug it.
  3. Are there some tools/tricks available to test if driver installation is successful and the devices are loaded with those drivers?

TIA

Kiran

Hello Kiran,

Thank you for posting your inquiry on the NVIDIA Networking Community.

Based on the information provided, please make sure you install the new driver version with the option ‘–dpdk’

Example:

./install --dpdk

This will install all the needed packages for DPDK. Maybe the earlier version you were running was MLNX_OFED, which by default installs all packages.

Thank you and regards,

~NVIDIA Networking Technical Support

Thanks Martijn for your response.

installing with --dpdk didn’t solve the problem - I have more information to share with you which probably will help.

Before upgrading the drivers we were able to see below infiniband devices on our system.

ls -lrt /sys/class/infiniband/

total 0

lrwxrwxrwx 1 root root 0 Jan 29 13:06 mlx4_3 → …/…/devices/pci0000:00/0000:00:09.0/infiniband/mlx4_3

lrwxrwxrwx 1 root root 0 Jan 29 13:06 mlx4_2 → …/…/devices/pci0000:00/0000:00:08.0/infiniband/mlx4_2

lrwxrwxrwx 1 root root 0 Jan 29 13:06 mlx4_1 → …/…/devices/pci0000:00/0000:00:07.0/infiniband/mlx4_1

lrwxrwxrwx 1 root root 0 Jan 29 13:06 mlx4_0 → …/…/devices/pci0000:00/0000:00:06.0/infiniband/mlx4_0

However after upgrading and restarting /etc/init.d/mlnx-en.d - we see all these devices got removed, possibly that is the reason ibv_get_device_list function doesn’t find any device.

Does any of rpm installation removes these infiniband devices?

Any hint around that will help.

Thanks,

Kiran