OS : Ubuntu 22.10
kernel : 5.19.0-45-generic
Server : ASUS ESC8000-G4
Device : MCX555A-ECAT ConnectX®-5 VPI adapter card, EDR IB (100Gb/s)
Driver : MLNX_OFED_LINUX-5.9-0.5.6.0-ubuntu22.10-x86_64.tgz
my customer operates approximately 8 nodes and 3 of them had an experience like this.
Someday network card’s network interface, which comes out when using ‘ifconfig’ command, disappeared and openibd daemon was in failed status. Even that moment, the communication(ping 192.168.10.x) still worked. But after few days, the communication stopped finally.
I resolved this deleting and reinstalling the driver with ‘–add-kernel-support’ option but still dont’ understand why this happened. Please help.