ConnectX-6 MCX653106A-ECAT firmware internal error, severity(3) ERROR:

Hardware Details
The model number of the card is MCX653106A-ECAT-SP.
Here’s the output from lspci | grep -i mellanox:

ca:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
ca:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]

Firmware Details
And the firmware details obtained using mstflint:

[root@localhost ~]# mstflint -d ca:00.0 q
Image type: FS4
FW Version: 20.43.1014
FW Release Date: 7.11.2024
Product Version: 20.43.1014
Rom Info: type=UEFI version=14.36.16 cpu=AMD64,AARCH64
type=PXE version=3.7.500 cpu=AMD64
Description: UID GuidsNumber
Base GUID: a088c20300aae43d 8
Base MAC: a088c2aae43d 8
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000224
Security Attributes: N/A

**Error Message**
Line 7140: [Thu May 15 16:08:59.475 2025] [ 6.160172] ERST: Error Record Serialization Table (ERST) support is initialized.

Line 7881: [Thu May 15 16:09:04.297 2025] [ 36.132204] mlx5_core 0000:ca:00.0: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:

Line 7894: [Thu May 15 16:09:04.554 2025] [ 36.132337] mlx5_core 0000:ca:00.0: print_health_info:445:(pid 0): severity 3 (ERROR)

Line 7896: [Thu May 15 16:09:04.554 2025] [ 36.132359] mlx5_core 0000:ca:00.0: print_health_info:447:(pid 0): synd 0x1: firmware internal error

Line 7900: [Thu May 15 16:09:04.571 2025] [ 36.773200] mlx5_core 0000:ca:00.1: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:

Line 7913: [Thu May 15 16:09:04.586 2025] [ 36.773302] mlx5_core 0000:ca:00.1: print_health_info:445:(pid 0): severity 3 (ERROR)

Line 7915: [Thu May 15 16:09:04.586 2025] [ 36.773320] mlx5_core 0000:ca:00.1: print_health_info:447:(pid 0): synd 0x1: firmware internal error

Line 10810: [Thu May 15 16:11:12.147 2025] [ 6.290428] ERST: Error Record Serialization Table (ERST) support is initialized.

Is there workaround for this error message?

In terms of functionality, there is no issue.

Please help to confirm

Hello,

Thanks for reaching out about your ConnectX-6 adapter issue!

Let’s Try These Quick Fixes:

Based on your error logs, this looks like a flash configuration issue that we can hopefully resolve with a few commands:

1. Flash Mode Configuration Fix

bash

mlxconfig -d ca:00.0 s FLASH_TYPE=0
mlxconfig -d ca:00.1 s FLASH_TYPE=0

Give your system a reboot after this change.

2. Firmware Reset

bash

mlxfwreset -d ca:00.0 reset
mlxfwreset -d ca:00.1 reset

3. Firmware Update
Consider updating to firmware version 20.43.2026 or later using MFT tools - sometimes newer firmware resolves these internal errors.

If You’re Still Having Trouble

Don’t worry if these steps don’t solve it! Please open an official support case with NVIDIA at: NVIDIA Enterprise Customer Support