Unrecoverable CI BIOS problem!

I have a Mellanox MSB7800 36-port 4x EDR IB managed switch and periodically, it will spontaneously reboot itself for seemingly no apparent reason.

I have reset the switch, first by plugging a serial console to it, and then I have, since then, configured it to work in my environment.

In checking the logs however, I see this:

BIOS Version: 4.6.5

BIOS Release Date: 05/21/2015

BIOS SubVersion: 0ABZS017_01.01.012

Feb 16 00:59:12 Error: System is booted from KGI SPI flash!

Feb 16 00:59:12 MAX 3 BIOS recovery attempts have been already done

Feb 16 00:59:12 Error: Unrecoverable CI BIOS problem!

Feb 16 00:59:17 switch1 kernel: klogd 1.4.1, log source = /proc/kmsg started.

Feb 16 00:59:17 switch1 kernel: Inspecting /boot/System.map

Feb 16 00:59:17 switch1 kernel: Cannot find map file.

Feb 16 00:59:17 switch1 kernel: No module symbols loaded - kernel modules not enabled.

Feb 16 00:59:17 switch1 kernel: cannot find any symbols, turning off symbol lookups

Feb 16 00:59:17 switch1 kernel: [ 0.000000] Initializing cgroup subsys cpuset

Feb 16 00:59:17 switch1 kernel: [ 0.000000] Initializing cgroup subsys cpu

Feb 16 00:59:17 switch1 kernel: [ 0.000000] Initializing cgroup subsys cpuacct

Feb 16 00:59:17 switch1 kernel: [ 0.000000] Linux version 3.10.0-54.0.1.el7MELLANOXsmp-x86_64 (@) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) X86_64 _tm_sx_mlnx_os_3_6_2000 #1 2016-11-10 03:51:19 SMP

I tried to google what it means “Unrecoverable CI BIOS problem” and I also tried looking to see if there was a solution here, but both of them - to no avail.

Attached is the dump of the log file from the switch.

Please let me know what I should do.

Your help, guidance, and support is greatly appreciated.

Thank you.

log.txt (292 KB)

I did disconnect to the switch via the serial cable and connect only via ssh.

The problem with that is when it encounters this problem (and it does so regularly - about once every hour or so), because it will reboot the entire switch, it will drop the ssh connection to it.

(It’s too loud when it reboots and spins all of the fans up for me to be near it without ear protection for an extended amount of time, so I only used the serial console cable long enough to set the console management IP address so that I can do everything else remotely/from another room.)

And so with that, yes, this problem will repeat itself.

Thank you though as I’ve sent the unit back and replaced it with a Mellanox MSB7890 unmanaged switch instead.

Could you disconnect serial cable, reboot the switch and see if the issue comes again? Don’t try connect a serial cable, use ssh to log-in to the switch.