Thank you for posting your question on the Mellanox Community.
One option we have for monitoring the health of the fabric is our UFM tool. This can be used to monitor the link health and can send alerts when there is a fabric health issue. You can find information on the tool and how to purchase here: NVIDIA Unified Fabric Manager (UFM) | NVIDIA
Another simpler option you could use is to write a script that detects link down events and then runs a command.
So for example you could create a script that looks for the term “link down” in messages log and runs a command to reset the module.