I have a Mellanox MSB7800 36-port 4x EDR IB managed switch and periodically, it will spontaneously reboot itself for seemingly no apparent reason.
I have reset the switch, first by plugging a serial console to it, and then I have, since then, configured it to work in my environment.
In checking the logs however, I see this:
BIOS Version: 4.6.5
BIOS Release Date: 05/21/2015
BIOS SubVersion: 0ABZS017_01.01.012
Feb 16 00:59:12 Error: System is booted from KGI SPI flash!
Feb 16 00:59:12 MAX 3 BIOS recovery attempts have been already done
Feb 16 00:59:12 Error: Unrecoverable CI BIOS problem!
Feb 16 00:59:17 switch1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Feb 16 00:59:17 switch1 kernel: Inspecting /boot/System.map
Feb 16 00:59:17 switch1 kernel: Cannot find map file.
Feb 16 00:59:17 switch1 kernel: No module symbols loaded - kernel modules not enabled.
Feb 16 00:59:17 switch1 kernel: cannot find any symbols, turning off symbol lookups
Feb 16 00:59:17 switch1 kernel: [ 0.000000] Initializing cgroup subsys cpuset
Feb 16 00:59:17 switch1 kernel: [ 0.000000] Initializing cgroup subsys cpu
Feb 16 00:59:17 switch1 kernel: [ 0.000000] Initializing cgroup subsys cpuacct
Feb 16 00:59:17 switch1 kernel: [ 0.000000] Linux version 3.10.0-54.0.1.el7MELLANOXsmp-x86_64 (@) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) X86_64 _tm_sx_mlnx_os_3_6_2000 #1 2016-11-10 03:51:19 SMP
I tried to google what it means “Unrecoverable CI BIOS problem” and I also tried looking to see if there was a solution here, but both of them - to no avail.
Attached is the dump of the log file from the switch.
Please let me know what I should do.
Your help, guidance, and support is greatly appreciated.
Thank you.
log.txt (292 KB)