command 0x54 failed: fw status = 0x2

Hi,

I installed two MT26448 on different servers.

Both are working fine except for the “kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2” error that floods the system log.

I updated the NIC drivers and tried to flash the firmware with no luck.

Can you help me with this, please?

You can see below all the troubleshooting steps:

# tail /var/log/messages Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:19 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:19 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:19 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 Mar 29 12:21:19 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2 # uname -a Linux CDN2 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) # lspci | grep Mellanox 04:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev a0) # ethtool -i eth2 driver: mlx4_en version: 2.2-1 (Feb 2014) firmware-version: 2.7.0 expansion-rom-version: bus-info: 0000:04:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes # lspci -vv -s 04:00.0 04:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev a0) Subsystem: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at df500000 (64-bit, non-prefetchable) [size=1M] Region 2: Memory at de800000 (64-bit, prefetchable) [size=8M] Expansion ROM at df400000 [disabled] [size=1M] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Product Name: Hawk Dual Port Read-only fields: [PN] Part number: 59Y1905 [EC] Engineering changes: A1 [SN] Serial number: YK502000004T [V0] Vendor specific: PCIe Gen2 x8 [RV] Reserved: checksum good, 0 byte(s) reserved Read/write fields: [V1] Vendor specific: N/A [YA] Asset tag: N/A [RW] Read-write area: 106 byte(s) free End Capabilities: [9c] MSI-X: Enable+ Count=256 Masked- Vector table: BAR=0 offset=0007c000 PBA: BAR=0 offset=0007d000 Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: mlx4_core Kernel modules: mlx4_core # mst status MST modules: ------------ MST PCI module loaded MST PCI configuration module loaded MST devices: ------------ /dev/mst/mt26448_pciconf0 - PCI configuration cycles access. domain:bus:dev.fn=0000:04:00.0 addr.reg=88 data.reg=92 Chip revision is: A0 /dev/mst/mt26448_pci_cr0 - PCI direct access. domain:bus:dev.fn=0000:04:00.0 bar=0xdf500000 size=0x100000 Chip revision is: A0 # flint --device /dev/mst/mt26448_pci_cr0 q -E- Cannot open Device: /dev/mst/mt26448_pci_cr0. No such file or directory. MFE_UNSUPPORTED_DEVICE # yum install mlnx-en-eth-only Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Package mlnx-en-eth-only-3.4-2.0.0.0.noarch already installed and latest version Nothing to do # yum install mlnx-fw-updater Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : mlnx-fw-updater-3.4-2.0.0.0.x86_64 1/1 Attempting to perform Firmware update... Querying Mellanox devices firmware ... Device #1: ---------- Device Type: N/A Part Number: -- Description: PSID: PCI Device Name: 04:00.0 Port1 MAC: N/A Port1 GUID: N/A Port2 MAC: N/A Port2 GUID: N/A Versions: Current Available FW -- Status: Failed to open device --------- -E- Failed to query 04:00.0 device, error : No such file or directory. MFE_UNSUPPORTED_DEVICE Log File: /tmp/mlnx_fw_update.log Failed to update Firmware. See /tmp/mlnx_fw_update.log Verifying : mlnx-fw-updater-3.4-2.0.0.0.x86_64 1/1 Installed: mlnx-fw-updater.x86_64 0:3.4-2.0.0.0 Complete! # mlxfwmanager_pci | grep PSID --------- -E- Failed to query 0000:04:00.0 device, error : No such file or directory. MFE_UNSUPPORTED_DEVICE PSID:

Hi Agustin,

Did you try the latest version of the MFT tool from Mellanox web site:

mlxup - Mellanox Update and Query Utility mlxup - Mellanox Update and Query Utility

Mellanox Firmware Tools (MFT) Mellanox Firmware Tools (MFT)

I wasn’t able to trace the serial number from ‘lspci’ output “[SN] Serial number: YK502000004T”, but it seems that device is ConnectX-2 and it needs the firwmare 2.9.XXXX from here Firmware for ConnectX® EN/ENt (Ethernet) Firmware for ConnectX® EN/ENt (Ethernet) . You can use ‘ibv_devinfo’ command to get board_id and then using it to download the firmware image.

I would suggest you to try the older version of Mellanox Firmware tools, 3.8.X and maybe earlier, from here Mellanox Firmware Tools (MFT) Mellanox Firmware Tools (MFT) in order to upgrade the firmware.

Once you have image downloaded, you may use ‘flint’ command for the upgrade

#flint -d 04:00.0 -i b

Thanks alkx, I tried the following:

# mst start Starting MST (Mellanox Software Tools) driver set Loading MST PCI module - Success Loading MST PCI configuration module - Success Create devices # mlxfwmanager Querying Mellanox devices firmware ... Device #1: ---------- Device Type: N/A Part Number: -- Description: PSID: PCI Device Name: /dev/mst/mt26448_pci_cr0 Port1 MAC: N/A Port1 GUID: N/A Port2 MAC: N/A Port2 GUID: N/A Versions: Current Available FW -- Status: Failed to open device --------- -E- Failed to query /dev/mst/mt26448_pci_cr0 device, error : No such file or directory. MFE_UNSUPPORTED_DEVICE

What am I missing?

Regards.

Thanks yairi,

Then, it is not a firmware problem.

Any ideas why this kernel message is flooding system log?

Mar 29 12:21:18 CDN2 kernel: mlx4_core 0000:04:00.0: command 0x54 failed: fw status = 0x2

Adding a filter to /etc/rsyslog.conf is just a temporary fix, it could be a bigger problem that I am not aware of.

# Log anything (except mail) of level info or higher. # Don't log private authentication messages! :msg, !contains, "mlx4_core" *.info;mail.none;authpriv.none;cron.none /var/log/messages

Using an older version gave better results but I could not find the PSID in the firmware list:

# mst status MST modules: ------------ MST PCI module loaded MST PCI configuration module loaded MST devices: ------------ /dev/mst/mt26448_pciconf0 - PCI configuration cycles access. domain:bus:dev.fn=0000:04:00.0 addr.reg=88 data.reg=92 Chip revision is: A0 /dev/mst/mt26448_pci_cr0 - PCI direct access. domain:bus:dev.fn=0000:04:00.0 bar=0xdf500000 size=0x100000 Chip revision is: A0 # flint -d /dev/mst/mt26448_pci_cr0 query Image type: FS2 FW Version: 2.7.0 Rom Info: type=PXE version=1.5.5 devid=26448 proto=ETH Device ID: 26448 Description: Port1 Port2 MACs: 0002c907766c 0002c907766d VSD: PSID: IBM0050000010 # ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.7.000 node_guid: ffff:ffff:ffff:ffff sys_image_guid: ffff:ffff:ffff:ffff vendor_id: 0x02c9 vendor_part_id: 26448 hw_ver: 0xA0 board_id: IBM0050000010 phys_port_cnt: 2 Device ports: port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet port: 2 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet

Hi,

For this PSID, FW 2.7.000 is the latest revision available. Archive of Firmware and Software for IBM InfiniBand Adapter Cards Archive of Firmware and Software for IBM InfiniBand Adapter Cards

This is an IBM card.

Thanks

Actually, on the same page, there is 2.9.1000 link as well - http://www.mellanox.com/downloads/firmware/fw-25408-rel-2_9_1000-59Y1905.bin.zip http://www.mellanox.com/downloads/firmware/fw-25408-rel-2_9_1000-59Y1905.bin.zip

You might give a try to the other version of the firmware ( don’t forget to reboot). If you’d like to save the current version, you might use ‘flint’ before the update.

For example,

#flint -d 04:00.0 ri /tmp/backup_fw.img

Updating to 2.9.1000 firmware fixed the problem.

flint -d 05:00.0 -i fw-25408-rel-2_9_1000-59Y1905.bin b Current FW version on flash: 2.7.0 New FW version: 2.9.1000 Burning FS2 FW image without signatures - OK Restoring signature - OK

Thanks!