Hi,
in my recent thread we determined that our cards(MBF2M332A-AEEO_Ax) are an older model.
As I’m still trying to figure out the reason behind an mlx5_core timeout during boot I was trying to locate the latest firmware for the device. mlxfwmanager does not find any newer firmware, but the EoL notification states that there should be a more recent version 24_30_1004. That version is also not available on the website. Where can I download this version of the firmware?
Thanks,
Raphael
EoL Notification: https://network.nvidia.com/pdf/eol/LCR-000770.pdf
Boot Error:
[ 14.870018] mlx5_core 0000:b1:00.0: poll_health:971:(pid 0): device's health compromised - reached miss count
[ 14.871506] mlx5_core 0000:b1:00.0: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[ 14.874398] mlx5_core 0000:b1:00.0: print_health_info:495:(pid 0): assert_var[0] 0x00000000
[ 14.875913] mlx5_core 0000:b1:00.0: print_health_info:495:(pid 0): assert_var[1] 0x00000000
[ 14.877420] mlx5_core 0000:b1:00.0: print_health_info:495:(pid 0): assert_var[2] 0x00000000
[ 14.878720] mlx5_core 0000:b1:00.0: print_health_info:495:(pid 0): assert_var[3] 0x00000000
[ 14.879377] mlx5_core 0000:b1:00.0: print_health_info:495:(pid 0): assert_var[4] 0x00000000
[ 14.880020] mlx5_core 0000:b1:00.0: print_health_info:495:(pid 0): assert_var[5] 0x00000000
[ 14.880660] mlx5_core 0000:b1:00.0: print_health_info:498:(pid 0): assert_exit_ptr 0x209ad498
[ 14.881296] mlx5_core 0000:b1:00.0: print_health_info:499:(pid 0): assert_callra 0x209b2b88
[ 14.881927] mlx5_core 0000:b1:00.0: print_health_info:500:(pid 0): fw_ver 24.28.1066
[ 14.882546] mlx5_core 0000:b1:00.0: print_health_info:502:(pid 0): time 0
[ 14.883162] mlx5_core 0000:b1:00.0: print_health_info:503:(pid 0): hw_id 0x00000214
[ 14.883768] mlx5_core 0000:b1:00.0: print_health_info:504:(pid 0): rfr 0
[ 14.884374] mlx5_core 0000:b1:00.0: print_health_info:505:(pid 0): severity 3 (ERROR)
[ 14.884971] mlx5_core 0000:b1:00.0: print_health_info:506:(pid 0): irisc_index 10
[ 14.885575] mlx5_core 0000:b1:00.0: print_health_info:507:(pid 0): synd 0x1: firmware internal error
[ 14.885970] mlx5_core 0000:b1:00.0: print_health_info:509:(pid 0): ext_synd 0x8a02
[ 14.886344] mlx5_core 0000:b1:00.0: print_health_info:510:(pid 0): raw fw_ver 0x181c042a
mlxfwmanager
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: BlueField2
Part Number: MBF2M332A-AEEO_Ax
Description: BlueField-2 SmartNIC 25GbE Dual-Port SFP56; PCIe Gen3/4 x8; Crypto; 16GB on-board DDR; 1GbE OOB management; HHHL
PSID: MT_0000000493
PCI Device Name: /dev/mst/mt41686_pciconf0
Base GUID: 000000002cb11ccc
Base MAC: 000000b11ccc
Versions: Current Available
FW 24.28.1066 N/A
Status: No matching image foun