Hi,
I installed two MT26448 on different servers.
Both are working fine except for the “kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2” error that floods the system log.
I updated the NIC drivers and tried to flash the firmware with no luck.
Can you help me with this, please?
You can see below all the troubleshooting steps:
# tail /var/log/messages Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:19 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:19 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:19 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:19 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 # uname -a Linux CDN2 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) # lspci | grep Mellanox 04:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev a0) # ethtool -i eth2 driver: mlx4_en version: 2.2-1 (Feb 2014) firmware-version: 2.7.0 expansion-rom-version: bus-info: 0000:04:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes # lspci -vv -s 04:00.0 04:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev a0) Subsystem: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at df500000 (64-bit, non-prefetchable) [size=1M] Region 2: Memory at de800000 (64-bit, prefetchable) [size=8M] Expansion ROM at df400000 [disabled] [size=1M] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Product Name: Hawk Dual Port Read-only fields: [PN] Part number: 59Y1905 [EC] Engineering changes: A1 [SN] Serial number: YK502000004T [V0] Vendor specific: PCIe Gen2 x8 [RV] Reserved: checksum good, 0 byte(s) reserved Read/write fields: [V1] Vendor specific: N/A [YA] Asset tag: N/A [RW] Read-write area: 106 byte(s) free End Capabilities: [9c] MSI-X: Enable+ Count=256 Masked- Vector table: BAR=0 offset=0007c000 PBA: BAR=0 offset=0007d000 Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: mlx4_core Kernel modules: mlx4_core # mst status MST modules: ------------ MST PCI module loaded MST PCI configuration module loaded MST devices: ------------ /dev/mst/mt26448_pciconf0 - PCI configuration cycles access. domain:bus:dev.fn=0000:04:00.0 addr.reg=88 data.reg=92 Chip revision is: A0 /dev/mst/mt26448_pci_cr0 - PCI direct access. domain:bus:dev.fn=0000:04:00.0 bar=0xdf500000 size=0x100000 Chip revision is: A0 # flint --device /dev/mst/mt26448_pci_cr0 q -E- Cannot open Device: /dev/mst/mt26448_pci_cr0. No such file or directory. MFE_UNSUPPORTED_DEVICE # yum install mlnx-en-eth-only Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Package mlnx-en-eth-only-3.4-2.0.0.0.noarch already installed and latest version Nothing to do # yum install mlnx-fw-updater Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : mlnx-fw-updater-3.4-2.0.0.0.x86_64 1/1 Attempting to perform Firmware update... Querying Mellanox devices firmware ... Device #1: ---------- Device Type: N/A Part Number: -- Description: PSID: PCI Device Name: 04:00.0 Port1 MAC: N/A Port1 GUID: N/A Port2 MAC: N/A Port2 GUID: N/A Versions: Current Available FW -- Status: Failed to open device --------- -E- Failed to query 04:00.0 device, error : No such file or directory. MFE_UNSUPPORTED_DEVICE Log File: /tmp/mlnx_fw_update.log Failed to update Firmware. See /tmp/mlnx_fw_update.log Verifying : mlnx-fw-updater-3.4-2.0.0.0.x86_64 1/1 Installed: mlnx-fw-updater.x86_64 0:3.4-2.0.0.0 Complete! # mlxfwmanager_pci | grep PSID --------- -E- Failed to query 0000:04:00.0 device, error : No such file or directory. MFE_UNSUPPORTED_DEVICE PSID: