I’m having an issue as of yesterday with a system that has a 40GB dual port daughter card in it. The network connections for the 2 ports are showing disconnected. I’m in a home lab with a single unmanaged switched running 2 instances of OpenSM on two separate servers.
The error I’m getting is in the event viewer and it spams repeated until I stop the OpenSM service on the host.
Mellanox ConnectX-2 IPoIB Adapter device reports a “Modify QP error” on qpn #0x58 Status #0xffffffea. Therefore, the HCA Nic will be reset. (The issue is reported in Function CMcast::CompleteJoinMcastWi).
My other 4 40GB IB cards are functioning properly and some of the things I’ve tried:
-
Restart the OpenSM service on both hosts
-
reset the daughter card
-
tried a different set of cables
-
reset the switch
-
reinstall the device drivers (4.90)
-
compared the advanced settings in the driver to the other daughter cards on another host
I’ve attached a snapshot and would appreciate any help.
Thanks