Specs:
Linux Kernel: 4.15.0-140-generic
OS: Ubuntu 18.04
MLNX_OFED_LINUX-4.9-2.2.4.0-ubuntu18.04-x86_64
Everything was working until reboot. I tried several things with results provided below. Also the port GIUDs changed and I had to manually update /etc/opensm/opensm.conf. (NOTE: opensm.conf is default template, I only modified by specifying port IDs). (looking at the OFED manual now for further diagnostics).
— Report —
sudo modprobe ib_umad (worked?)
sudo modprobe xprtrdma
modprobe: ERROR: could not insert ‘rpcrdma’: Unknown symbol in module, or unknown parameter (see dmesg)
dmesg | tail
rpcrdma: Unknown symbol ib_alloc_cq (err 0)
rpcrdma: Unknown symbol ib_dereg_mr (err 0)
rpcrdma: Unknown symbol rdma_create_id (err 0)
rpcrdma: Unknown symbol ib_alloc_mr (err 0)
rpcrdma: Unknown symbol ib_free_cq (err 0)
rpcrdma: Unknown symbol rdma_accept (err 0)
rpcrdma: Unknown symbol ib_destroy_qp (err 0)
rpcrdma: Unknown symbol ib_dealloc_pd (err 0)
sminfo
ibwarn: [18194] mad_rpc_open_port: can’t open UMAD port ((null):0)
sminfo: iberror: failed: Failed to open ‘(null)’ port ‘0’
NOTE: no rdma service installed
sudo osmtest -f c (same output for -f a, except ‘validation’ instead of ‘inventory’)
Command Line Arguments
Done with args
Flow = Create Inventory
Apr 07 10:31:23 592167 [2110F740] 0x7f → Setting log level to: 0x03
Apr 07 10:31:23 592367 [2110F740] 0x02 → osm_vendor_init: 1000 pending umads specified
Apr 07 10:31:23 661108 [2110F740] 0x02 → osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0x2c903003fc582
Apr 07 10:31:23 745819 [1F6B1700] 0x01 → __osmv_sa_mad_rcv_cb: ERR 5501: Remote error:0x000C
Apr 07 10:31:23 745869 [1F6B1700] 0x01 → osmtest_query_res_cb: ERR 0003: Error on query (IB_REMOTE_ERROR)
Apr 07 10:31:23 745955 [2110F740] 0x01 → osmtest_validate_sa_class_port_info: ERR 0070: ib_query failed (IB_REMOTE_ERROR)
Apr 07 10:31:23 745993 [2110F740] 0x01 → osmtest_validate_sa_class_port_info: Remote error = IB_MAD_STATUS_UNSUP_METHOD_ATTR
Apr 07 10:31:23 746013 [2110F740] 0x01 → osmtest_run: ERR 0138: Could not obtain SA ClassPortInfo (IB_REMOTE_ERROR)
OSMTEST: TEST “Create Inventory” FAIL
sudo systemctl restart opensm, output of /var/log/opensm.log
Apr 07 10:24:02 117367 [8BF43740] 0x80 → Exiting SM
Apr 07 10:26:07 072815 [E3C3D740] 0x03 → OpenSM 5.7.2.MLNX20201014.9378048
OpenSM 5.7.2.MLNX20201014.9378048
Apr 07 10:26:07 072926 [E3C3D740] 0x80 → OpenSM 5.7.2.MLNX20201014.9378048
Apr 07 10:26:07 077131 [E3C3D740] 0x02 → osm_vendor_init: 1000 pending umads specified
Apr 07 10:26:07 077241 [E3C3D740] 0x02 → osm_vendor_init: 1000 pending umads specified
Apr 07 10:26:07 077354 [E3C3D740] 0x02 → osm_vendor_init: 1000 pending umads specified
Entering DISCOVERING state
Apr 07 10:26:07 080343 [E3C3D740] 0x80 → Entering DISCOVERING state
Apr 07 10:26:07 080556 [E3C3D740] 0x02 → osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0x2c903003fc581
Apr 07 10:26:07 171455 [E3C3D740] 0x02 → osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0x2c903003fc582
Apr 07 10:26:07 257881 [E3C3D740] 0x02 → osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0x2c903003fc581
Apr 07 10:26:07 344319 [E3C3D740] 0x02 → osm_vendor_bind: Mgmt class 0x04 binding to port GUID 0x2c903003fc581
Apr 07 10:26:07 344394 [E3C3D740] 0x02 → osm_vendor_bind: Mgmt class 0x21 binding to port GUID 0x2c903003fc581
Apr 07 10:26:07 344453 [E3C3D740] 0x02 → osm_opensm_bind: Setting IS_SM on port 0x0002c903003fc581
SM port is down
sudo hca_self_test.ofed
---- Performing Adapter Device Self Test ----
Number of CAs Detected … 1
PCI Device Check … PASS
Kernel Arch … x86_64
Host Driver Version … MLNX_OFED_LINUX-4.9-2.2.4.0 (OFED-4.9-2.2.4): 4.15.0-140-generic
Host Driver RPM Check … PASS
Firmware on CA #0 VPI … v2.42.5000
Host Driver Initialization … PASS
Number of CA Ports Active … 0
Error Counter Check on CA #0 (VPI)… PASS
Kernel Syslog Check … PASS
Node GUID on CA #0 (VPI) … NA
------------------ DONE ---------------------