We have lost a LOT of Mellanox 40Gb NICs, only using Cisco Bi-Di 3.5w optics. When we change to 1.5w MPO SFPs we solve that issue. (at a heavy cost to mgmt)
I have to ask WHY Mellanox isnt resolving the PANIC as that causes NODE loss. Thats the real problem, the NIC loss isnt as painful as a NODE PANIC. Surely you know about this and I am still dealing with it. When will Mellanox address the PANIC as thats the thorn in our side.
2019-01-08T10:28:04-08:00 <0.6> syslogd: kernel boot file is /boot/kernel.amd64/kernel
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: panic @ time 1546971674.663, thread 0xfffff8052123a780: vm_fault: fault on nofault entry, addr: fffffe002c04d000
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: cpuid = 12
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: Panic occurred in module kernel loaded at 0xffffffff80200000:
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel:
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: Stack: --------------------------------------------------
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:vm_fault_hold+0x17fc
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:vm_fault+0x76
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:trap_pfault+0x2a1
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:trap+0x64c
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:show_diag_rprt+0x1b
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:sysctl_root+0x246
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:userland_sysctl+0x1d1
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:sys___sysctl+0x73
2019-01-08T10:28:04-08:00 <0.7> /boot/kernel.amd64/kernel: kernel:amd64_syscall+0x396
thx
Mark Licata