MELLANOX MCX653105A-HDAT Firmware ERR

Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: poll_health:1087:(pid 1786): device’s health compromised - reached miss count
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:513:(pid 1786): Health issue observed, firmware internal error, severity(3) ERROR:
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[0] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[1] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[2] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[3] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[4] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[5] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:519:(pid 1786): assert_exit_ptr 0x209f1660
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:520:(pid 1786): assert_callra 0x209f8520
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:522:(pid 1786): fw_ver 20.39.3560
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:523:(pid 1786): time 0
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:524:(pid 1786): hw_id 0x0000020f
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:525:(pid 1786): rfr 0
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:526:(pid 1786): severity 3 (ERROR)
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:527:(pid 1786): irisc_index 5
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:529:(pid 1786): synd 0x1: firmware internal error
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:530:(pid 1786): ext_synd 0x8a02
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:531:(pid 1786): raw fw_ver 0x14270de8

​​​Reply
更新固件并不能解决这个问题,请问要怎么解决

Please check my reply on this post:

Regards,
Yaniv

1 Like