connectX-5 card's network interface disappeared and openibd daemon failed on several nodes

OS : Ubuntu 22.10
kernel : 5.19.0-45-generic
Server : ASUS ESC8000-G4
Device : MCX555A-ECAT ConnectX®-5 VPI adapter card, EDR IB (100Gb/s)
Driver : MLNX_OFED_LINUX-5.9-

my customer operates approximately 8 nodes and 3 of them had an experience like this.

Someday network card’s network interface, which comes out when using ‘ifconfig’ command, disappeared and openibd daemon was in failed status. Even that moment, the communication(ping 192.168.10.x) still worked. But after few days, the communication stopped finally.

I resolved this deleting and reinstalling the driver with ‘–add-kernel-support’ option but still dont’ understand why this happened. Please help.

From the logs, likely driver kernel module not correctly load, or you install old version not uninstall. This kind of issue can resolve by reinstall.

1 Like

thanks for your advice.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.