BlueField3无法重启运行32.42.1000的固件

在使用flint手动从32.40.1000升级成32.42.1000后,无法重启运行新版本的固件

[root@H3C_R5300 mlnx-fw-updater]# mlxfwreset -d /dev/mst/mt41692_pciconf0 reset

E- Synchronization by driver is not supported in the current state of this device.

[root@H3C_R5300 mlnx-fw-updater]# flint -d /dev/mst/mt41692_pciconf0 q
Image type: FS4
FW Version: 32.42.1000
FW Version(Running): 32.40.1000
FW Release Date: 8.8.2024
Product Version: 32.40.1000
Rom Info: type=UEFI Virtio net version=21.4.13 cpu=AMD64,AARCH64
type=UEFI Virtio blk version=22.4.12 cpu=AMD64,AARCH64
type=UEFI version=14.33.10 cpu=AMD64,AARCH64
type=PXE version=3.7.300 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 9c63c00300aeaabc 22
Base MAC: 9c63c0aeaabc 22
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000001069
Security Attributes: secure-fw
[root@H3C_R5300 download]# mlxfwreset -d /dev/mst/mt41692_pciconf0 -l 4 -t 0 reset

The reset level for device, /dev/mst/mt41692_pciconf0 is:

4: Warm Reboot
Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset.
The ARM side will be restarted, and it will be unavailable for a while.
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw -Done
Waiting for mlxfwreset to run on all other hosts, press ‘ctrl+c’ to abort
Failed
-E- fsm sync timed out.

Hi,
From the command you mentioned in the case description, it seems you are using Nvidia BlueField-3 B3140H. But I am not sure whether you are using NIC mode or DPU mode now.

I see you were using the mlxfwreset command with “-l 4” and “-t 0” options.
I suggest you can try to use this command instead:

mlxfwreset -d /dev/mst/mt41692_pciconf0 -y -l 4 --sync 0 r

是Nvidia BlueField-3 B3140H这个设备,使用的是 NIC 模式,修改的命令也无法激活新版固件,也试过重启硬件服务器设备,均无效,还有其他的办法吗

[root@H3C_R5300 ~]# mlxfwreset -d /dev/mst/mt41692_pciconf0 -y -l 4 --sync 0 reset

The reset level for device, /dev/mst/mt41692_pciconf0 is:

4: Warm Reboot
Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset.
The ARM side will be restarted, and it will be unavailable for a while.
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw -Done
Waiting for mlxfwreset to run on all other hosts, press ‘ctrl+c’ to abort
Failed
-E- fsm sync timed out.
[root@H3C_R5300 ~]# mlxfwreset -d /dev/mst/mt41692_pciconf0 reset

-E- Synchronization by driver is not supported in the current state of this device.
[root@H3C_R5300 ~]# mlxfwreset -d /dev/mst/mt41692_pciconf0 query

Reset-levels:
0: Driver, PCI link, network link will remain up (“live-Patch”) -Not Supported
1: Only ARM side will not remain up (“Immediate reset”). -Not Supported
3: Driver restart and PCI reset -Supported (default)
4: Warm Reboot -Supported

Reset-types (relevant only for reset-levels 1,3,4):
0: Full chip reset -Supported (default)
1: Phy-less reset (keep network port active during reset) -Not Supported
2: NIC only reset (for SoC devices) -Not Supported
3: ARM only reset -Not Supported
4: ARM OS shut down -Not Supported

Reset-sync (relevant only for reset-level 3):
0: Tool is the owner -Not supported
1: Driver is the owner -Not supported (default)

Reset-reason: Cold reset
Timestamp (number of clock cycles) since last cold reset: 53202443788936

1.Plese try --sync as below
“mlxfwreset -d /dev/mst/mt41692_pciconf0 -y -l 4 --sync 1 reset”

2.If the command above doesn’t help, please try to use the AC cycle(over 30 sec) instead of the reboot/power-cycle. I think this should work.

mlxfwreset -d /dev/mst/mt41692_pciconf0 reset
mlxfwreset -d /dev/mst/mt41692_pciconf0 -l 4 -t 0 reset
mlxfwreset -d /dev/mst/mt41692_pciconf0 -y -l 4 --sync 0 reset
mlxfwreset -d /dev/mst/mt41692_pciconf0 -y -l 4 --sync 1 reset
init6/reboot

以上命令都试过,均无效。

“please try to use the AC cycle(over 30 sec) instead of the reboot/power-cycle. ”

I don’t understand this sentence, please tell me how to do it.

Please try AC power cycle – which means power off all the device power supply for a while from BMC then boot again.
Or, you can ask someone to unplug the power supply on the whole physical server for a while then plug the power cable back again.

The power cycling the box off seems to be the solution here as that is what I had to do. However using the mlxfwreset didn’t work with any of the offered syntax. Further instead of it saying “reboot” when the firmware is applied it would create a better user experience to say power cycle the node as just a reboot doesn’t do anything. At least not on the dual port BF3s I needed to upgrade.