Uploading: image.png…
请问有什么解决办法吗
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: poll_health:1087:(pid 1786): device’s health compromised - reached miss count
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:513:(pid 1786): Health issue observed, firmware internal error, severity(3) ERROR:
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[0] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[1] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[2] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[3] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[4] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:517:(pid 1786): assert_var[5] 0x00000000
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:519:(pid 1786): assert_exit_ptr 0x209f1660
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:520:(pid 1786): assert_callra 0x209f8520
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:522:(pid 1786): fw_ver 20.39.3560
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:523:(pid 1786): time 0
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:524:(pid 1786): hw_id 0x0000020f
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:525:(pid 1786): rfr 0
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:526:(pid 1786): severity 3 (ERROR)
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:527:(pid 1786): irisc_index 5
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:529:(pid 1786): synd 0x1: firmware internal error
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:530:(pid 1786): ext_synd 0x8a02
Jul 05 15:07:34 centosc9 kernel: mlx5_core 0000:01:00.0: print_health_info:531:(pid 1786): raw fw_ver 0x14270de8
Dear Customer,
Thank you for reaching out to the NVIDIA Community.
To address your inquiry, please follow the steps below to update your device firmware:
Using flint
to Update Mellanox Firmware
The flint
utility is a command-line tool provided by Mellanox (NVIDIA) for burning (updating) firmware on ConnectX and other Mellanox network adapters. Below are the essential steps and usage examples for updating firmware with flint
.
1. Preparation
- Install Mellanox Firmware Tools (MFT):
- Download and install the MFT package from the official NVIDIA Networking support site.
- Download Firmware:
- Obtain the correct firmware image (usually a
.bin
file) for your specific adapter model and PSID from the NVIDIA support site.
- Obtain the correct firmware image (usually a
- Identify the Device:
- Use
mst status
to list Mellanox devices and get the device name (e.g.,/dev/mst/mt4119_pci_cr0
).
- Use
2. Basic flint
Commands
Query Current Firmware Version
flint -d <device_name> q
- Example:
flint -d /dev/mst/mt4119_pci_cr0 q
Burn (Update) Firmware
flint -d <device_name> -i <firmware_file>.bin burn
- Example:
flint -d /dev/mst/mt4119_pci_cr0 -i fw-4119-rel-28_37_1014.bin burn
- The
burn
command writes the new firmware to the device.
Verify Firmware Version After Update
flint -d <device_name> q
- Confirm the firmware version matches the new image.
3. Step-by-Step Firmware Update Procedure
Step | Command/Action | Notes |
---|---|---|
Start MST service | mst start |
Initializes Mellanox device support |
List devices | mst status |
Find the correct device name |
Unzip firmware | unzip <firmware_file>.zip |
If firmware is zipped |
Burn firmware | flint -d <device_name> -i <firmware_file>.bin burn |
Main update step |
Reboot | reboot |
Required for update to take effect |
Verify | flint -d <device_name> q |
Check new firmware version |
4. Common Options and Flags
-d <device_name>
: Specifies the Mellanox device.-i <image_file>
: Specifies the firmware binary image.burn
: Command to write the firmware.q
: Query device for current firmware and attributes.-y
: Non-interactive mode (assume “yes” to prompts).
5. Example Full Workflow
Start MST service
sudo mst start
List Mellanox devices
sudo mst status
Burn firmware (replace with your device and firmware file)
sudo flint -d /dev/mst/mt4119_pci_cr0 -i fw-4119-rel-28_37_1014.bin burn
Reboot system
sudo reboot
Verify firmware version
sudo flint -d /dev/mst/mt4119_pci_cr0 q
Note: Please download the latest firmware from the link below:
https://network.nvidia.com/support/firmware/firmware-downloads/
If you have any questions or encounter any issues during the update process, please do not hesitate to contact us.
Best regards,
NVIDIA Support
并没有解决这个问题
请问还有别的解决方法吗
尊敬的客户:
请按照上述指引进行固件(Firmware)升级。若升级过程中遇到任何问题,欢迎通过以下方式随时联系我们:
-
企业支持门户(Enterprise Support Portal)
您可通过 NVIDIA 企业支持门户网站,全天候(24/7)在线提交服务请求,获取即时支持。 -
邮件支持
也可将技术服务需求发送至 NVIDIA 技术支持邮箱:
EnterpriseSupport@nvidia.com
感谢您对 NVIDIA 的支持!
NVIDIA 技术支持团队