Sometimes the connection via IPoIB doesn’t work.
ibping works fine.
Normal ping doesn’t.
Our way to repair this is to reload the driver by:
modprobe -r ib_ipoib
Is there any mechanism to detect or repair this situation automatically ?
Thank you for posting your question on the Mellanox Community.
One option we have for monitoring the health of the fabric is our UFM tool. This can be used to monitor the link health and can send alerts when there is a fabric health issue. You can find information on the tool and how to purchase here: https://www.mellanox.com/products/ufm#ufm-telemetry
Another simpler option you could use is to write a script that detects link down events and then runs a command.
So for example you could create a script that looks for the term “link down” in messages log and runs a command to reset the module.
For further support on this issue you can contact email@example.com with a valid support contract. If would like to purchase a support contract please contact your sales representative or contact our sales team here https://store.mellanox.com/customer-service/contact-us
Thanks and regards,
Mellanox Technical Support