IPoIP sometimes fails. Is there any way to detect repair this automatically ?

Sometimes the connection via IPoIB doesn’t work.

ibping works fine.

Normal ping doesn’t.

Our way to repair this is to reload the driver by:

modprobe -r ib_ipoib

modprobe ib_ipoib

Is there any mechanism to detect or repair this situation automatically ?

Hello Maik,

One option we have for monitoring the health of the fabric is our UFM tool. This can be used to monitor the link health and can send alerts when there is a fabric health issue. You can find information on the tool and how to purchase here: https://www.mellanox.com/products/ufm#ufm-telemetry

Another simpler option you could use is to write a script that detects link down events and then runs a command.

So for example you could create a script that looks for the term “link down” in messages log and runs a command to reset the module.

