SX6710 Keeps Shutting Down

I have an SX6710 in our lab - first exposure to Mellanox - and it keeps going into what I call “Jet Engine Mode” as the fans are spinning so fast it sounds like a plane ready to take off. Even though there is still power and the fans are spinning out of control, all of the ports are off and the web interface is inaccessible as well even though I can still ping the management port that the web is connected to.

I’m trying to parse through the logs to see if I can figure out why it is shutting down but being completely unfamiliar, I’m not making too much sense of it. When I look at the log after rebooting (I need to pull the power plugs to reboot) I can clearly see the system reboot message. When I start looking backwards from there, I can clearly find where the fan max threshold is set to 18000 and I can see statsd.ERR messages saying the Health Daemon is already performing a sweep. Searching backwards from there I can find a health.ERR showing Health Daemon ISR scan failed.

Again, being brand new to Mellanox, I’d appreciate any advice or pointer to the direction I should be looking as I’m not even sure I’m in the correct rabbit hole at this point.

Thanks!!!

Hi Angelo,

If this issue is affecting your production, then please email Mellanox support at support@mellanox.com mailto:support@mellanox.com so the issue can be properly examined.

Thanks,

Christeen

Appreciate that Christeen but as I mentioned, it’s in our lab. It’s a pain in the butt to reboot all the time and sometimes it runs for 3 or 4 days with no issues and other times we reboot it 4-5 times in one day. I’m pretty sure the answer is somewhere in the log but I am just unfamiliar with them and nothing is jumping out when I search through them for an error.