Hi,
I just installed the MLNX_OFED_LINUX-4.9-7.1.0.0 driver for Nvidia ConnectX-7 NDR200/HDR QSFP112 2-port PCIe Gen5 x16 InfiniBand Adapter
But it fails to get ibv devices
ibv_devices
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
device node GUID
------ ----------------
I install the driver with
./mlnxofedinstall --without-32bit --enable-affinity --hpc --without-iser --without-srp --without-opensm --with-infiniband-diags --without-fw-update --with-nfsrdma --force
I have the ibstat looks OK: for mlx5_0/1/2/3
ibstat
CA ‘mlx5_0’
CA type: MT4129
Number of ports: 1
Firmware version: 28.38.1002
Hardware version: 0
Node GUID: 0x946dae030060fd58
System image GUID: 0x946dae030060fd58
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 712
LMC: 0
SM lid: 4
Capability mask: 0xa751e848
Port GUID: 0x946dae030060fd58
Link layer: InfiniBand
CA ‘mlx5_1’
CA type: MT4129
Number of ports: 1
Firmware version: 28.38.1002
Hardware version: 0
Node GUID: 0x946dae030060fd59
System image GUID: 0x946dae030060fd58
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 713
LMC: 0
SM lid: 4
Capability mask: 0xa751e848
Port GUID: 0x946dae030060fd59
Link layer: InfiniBand
I have the kernel modules
lsmod | egrep -i “ib|rdma|verbs”
rdma_ucm 26934 0
ib_ucm 22566 0
rdma_cm 61162 1 rdma_ucm
iw_cm 43918 1 rdma_cm
ib_ipoib 176977 0
ib_cm 53064 3 rdma_cm,ib_ucm,ib_ipoib
ib_umad 27587 0
mlx5_ib 398193 0
ib_uverbs 134646 3 mlx5_ib,ib_ucm,rdma_ucm
mlx5_core 1175358 2 mlx5_ib,mlx5_fpga_tools
mlx4_ib 220791 0
ib_core 379768 10 rdma_cm,ib_cm,iw_cm,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
mlx4_core 361102 2 mlx4_en,mlx4_ib
mlx_compat 47141 15 rdma_cm,ib_cm,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,mlx5_fpga_tools,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib
devlink 60067 4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core
The ipoib seems to work I can ping other working nodes from the ib0 interface/address.
The library is there.
ls /lib64/libibverbs.so* -alht
lrwxrwxrwx 1 root root 19 Nov 29 10:36 /lib64/libibverbs.so → libibverbs.so.1.0.0
lrwxrwxrwx 1 root root 19 Nov 29 10:35 /lib64/libibverbs.so.1 → libibverbs.so.1.0.0
-rwxr-xr-x 1 root root 103K Jun 6 11:05 /lib64/libibverbs.so.1.0.0
It appears some signing key error in the dmesg or syslog, but I think this is no harm.
[Wed Nov 29 10:44:57 2023] Request for unknown module key ‘Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403’ err -11
I have reinstalled the driver, but it does not seem to work.
What could be wrong?
Thanks,
Wei