Hello!
I’ve been trying to update a BlueField-2 DPU (PSID: MT_0000000561) and the DOCA libraries, unfortunately with no success. I’ve tried upgrading to various versions, e.g. the 3.1.0 LTS or 2.9.3 LTS. In all my attempts the firmware version of the DPU remained unchanged: it always stays on 24.32.2004.
Sometimes I was able to update the non-running firmware, but I was never able to execute a restart to actually finish the upgrading. For example, in the attempt below I received a “Failed to send Register MFRL: Bad parameter (265)” error message:
sudo mlxfwmanager -i fw-BlueField-2-rel-24_35_2000-MBF2M516A-CEEO_Ax_Bx-NVME-20.4.1-UEFI-21.4.10-UEFI-22.4.10-UEFI-14.28.16-FlexBoot-3.6.805.bin -u
# Output:
# [...]
# Device Type: BlueField2
# Part Number: MBF2M516A-CEEO_Ax_Bx
# Description: BlueField-2 E-Series DPU 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL
# PSID: MT_0000000561
# PCI Device Name: /dev/mst/mt41686_pciconf0
# Base GUID: 1070fd030086622a
# Base MAC: 1070fd86622a
# Versions: Current Available
# FW 24.32.2004 24.35.2000
# [...]
# Device #1: Updating FW ...
# FSMST_INITIALIZE - OK
# Writing Boot image component - OK
# Done
# Restart needed for updates to take effect.
sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 -y r
# Output:
# -E- Synchronization by driver is not supported in the current state of this device.
sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 q
# Reset-levels:
# 0: Driver, PCI link, network link will remain up ("live-Patch") -Not Supported
# 1: Only ARM side will not remain up ("Immediate reset"). -Not Supported
# 3: Driver restart and PCI reset -Supported (default)
# 4: Warm Reboot -Supported
# Reset-types (relevant only for reset-levels 1,3,4):
# 0: Full chip reset -Supported
# 1: Phy-less reset (keep network port active during reset) -Not Supported
# 2: NIC only reset (for SoC devices) -Supported (default)
# 3: ARM only reset -Not Supported
# 4: ARM OS shut down -Not Supported
# Reset-sync (relevant only for reset-level 3):
# 0: Tool is the owner -Not supported
# 1: Driver is the owner -Not supported (default)
gsarkozi@epyc4:~/fw-bf-bin$ sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 --level 4 r
# The reset level for device, /dev/mst/mt41686_pciconf0 is:
# 4: Warm Reboot
# Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset.
# Continue with reset?[y/N] y
# -I- Sending Reset Command To Fw -Failed
# -E- Failed to send Register MFRL: Bad parameter (265).
I also tried (several times) to install DOCA 2.9.3 LTS from scratch on both the host and the DPU, following the official documentation. These attempts yielded no results either: the firmware version remained unchanged, as can be seen below:
# Host: query information
lsb_release -a # Ubuntu 24.04.3 LTS
uname -r # 6.8.0-87-generic
# Host: delete old versions
for f in $( dpkg --list | grep -E 'doca|flexio|dpa-gdbserver|dpa-stats|dpa-resource-mgmt|dpaeumgmt' | awk '{print $2}' ); do echo $f ; sudo apt remove --purge $f -y ; done
sudo /usr/sbin/ofed_uninstall.sh --force
sudo apt-get autoremove
sudo reboot
# Host: install new DOCA version via doca-kernel-support
wget https://www.mellanox.com/downloads/DOCA/DOCA_v2.9.3/host/doca-host_2.9.3-021000-24.10-ubuntu2404_amd64.deb
sudo dpkg -i doca-host_2.9.3-021000-24.10-ubuntu2404_amd64.deb
sudo apt-get update
sudo apt-get install -y doca-extra
sudo /opt/mellanox/doca/tools/doca-kernel-support # [...] You can install any of the other userspace packages (doca-all-userspace, doca-all-networking)
sudo dpkg --install /tmp/DOCA.bLTQs4xJA5/doca-kernel-repo-24.10-3.2.5.0-6.8.0.87.generic_24.10.3.2.5.0_amd64.deb
sudo apt-get update
sudo apt-get install -y doca-all-userspace doca-kernel-6.8.0.87.generic
# Host: initialize drivers, MST
sudo /etc/init.d/openibd restart
# Output:
# Unloading HCA driver: [ OK ]
# Loading HCA driver and Access Layer: [ OK ]
sudo mst restart
# Output:
# Stopping MST (Mellanox Software Tools) driver set
# Starting MST (Mellanox Software Tools) driver set
# Loading MST PCI module - Success
# Loading MST PCI configuration module - Success
# Create devices
# Unloading MST PCI module (unused) - Success
# Host: query information
sudo /opt/mellanox/doca/tools/doca-info
# Output:
# Versions:
# - MFT 4.30.1-1210
# - DOCA Base (OFED) MLNX_OFED_LINUX-24.10-3.2.5.0
# - DOCA <none>
#
# UEFI\ATF versions:
# - mst_device: mt41692_pciconf[0-9]
# UEFI Version: N\A
# ATF Version: N\A
#
# Firmware (Current):
# - BlueField-2
#
# SNAP3:
# - mlnx-libsnap NA (package not found)
# - mlnx-snap NA (package not found)
# - spdk NA (package not found)
#
# DOCA:
# - doca-all-userspace 2.9.3-0.2.2
# - doca-bench 2.9.3008-1
# - doca-caps 2.9.3008-1
# - doca-comm-channel-admin 2.9.3008-1
# - doca-devel 2.9.3-0.2.2
# - doca-extra 0.1.7-1
# - doca-host 2.9.3-021000-24.10-ubuntu2404
# [...]
sudo mst status -v
# Output:
# MST modules:
# MST PCI module is not loaded
# MST PCI configuration module loaded
# PCI devices:
# DEVICE_TYPE MST PCI RDMA NET NUMA
# BlueField2(rev:1) /dev/mst/mt41686_pciconf0.1 01:00.1 -1
# BlueField2(rev:1) /dev/mst/mt41686_pciconf0 01:00.0 -1
sudo flint -d /dev/mst/mt41686_pciconf0 query
# Output:
# Image type: FS4
# FW Version: 24.35.2000
# FW Version(Running): 24.32.2004
# FW Release Date: 24.11.2022
# Product Version: 24.32.2004
# Rom Info: type=UEFI Virtio net version=21.2.10 cpu=AMD64
# type=UEFI Virtio blk version=22.2.10 cpu=AMD64
# type=UEFI version=14.25.18 cpu=AMD64,AARCH64
# type=PXE version=3.6.502 cpu=AMD64
# Description: UID GuidsNumber
# Base GUID: 1070fd030086622a 12
# Base MAC: 1070fd86622a 12
# Image VSD: N/A
# Device VSD: N/A
# PSID: MT_0000000561
# Security Attributes: N/A
sudo mlxfwmanager --query
# Output:
# Device Type: BlueField2
# Part Number: MBF2M516A-CEEO_Ax_Bx
# Description: BlueField-2 E-Series DPU 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL
# PSID: MT_0000000561
# PCI Device Name: /dev/mst/mt41686_pciconf0
# Base GUID: 1070fd030086622a
# Base MAC: 1070fd86622a
# Versions: Current Available
# FW 24.35.2000 N/A
# FW (Running) 24.32.2004 N/A
# PXE 3.6.0502 N/A
# UEFI 14.25.0018 N/A
# UEFI Virtio blk 22.2.0010 N/A
# UEFI Virtio net 21.2.0010 N/A
# Status: No matching image found
# Host: update DPU
ls -la /dev/ | grep rshim # No output
systemctl restart rshim
ls -la /dev/ | grep rshim # rshim0
sudo bfb-install --rshim rshim0 --bfb bf-bundle-2.9.3-32_25.06_ubuntu-22.04_prod.bfb
# Output: (notice "Reactivating previous firmware image" and "update done: 24.43.3608")
# Checking if local host has root access...
# Checking if rshim driver is running locally...
# Pushing bfb
# Collecting BlueField booting status. Press Ctrl+C to stop…
# INFO[BL2]: start
# INFO[BL2]: boot mode (rshim)
# INFO[BL2]: DDR POST passed
# INFO[BL2]: UEFI loaded
# INFO[BL31]: start
# INFO[BL31]: lifecycle GA Non-Secured
# INFO[BL31]: runtime
# INFO[UEFI]: UPVS valid
# INFO[UEFI]: eMMC init
# INFO[UEFI]: eMMC probed
# INFO[UEFI]: PMI: updates started
# INFO[UEFI]: PMI: total updates: 1
# INFO[UEFI]: PMI: updates completed, status 0
# INFO[UEFI]: PCIe enum start
# INFO[UEFI]: PCIe enum end
# INFO[UEFI]: UEFI Secure Boot (disabled)
# INFO[UEFI]: Redfish enabled
# INFO[UEFI]: exit Boot Service
# INFO[MISC]: Erasing eMMC drive: /dev/mmcblk0
# INFO[MISC]: Ubuntu installation started
# INFO[MISC]: Installing OS image
# INFO[MISC]: Ubuntu installation completed
# INFO[MISC]: Resetting BMC Rshim log
# INFO[MISC]: Rshim log cleared
# WARN[MISC]: Skipping BMC components upgrade.
# INFO[MISC]: Reactivating previous firmware image on the NIC
# INFO[MISC]: Updating NIC firmware...
# INFO[MISC]: NIC firmware update done: 24.43.3608
# INFO[MISC]: Installation finished
sudo cat /dev/rshim0/misc
# Output:
# DISPLAY_LEVEL 2 (0:basic, 1:advanced, 2:log)
# BOOT_MODE 1 (0:rshim, 1:emmc, 2:emmc-boot-swap)
# BOOT_TIMEOUT 300 (seconds)
# USB_TIMEOUT 40 (seconds)
# DROP_MODE 0 (0:normal, 1:drop)
# SW_RESET 0 (1: reset)
# DEV_NAME pcie-0000:01:00.2
# DEV_INFO BlueField-2(Rev 1)
# OPN_STR MBF2M516A-CEEO
# FORCE_CMD 0 (1: send Force command)
# Log Messages
# !!! THE FOLLOWING APPEARED FOR A FEW SECONDS !!!
# Something similar to: NIC (?) reset not supported, power cycle required
# Then the log messages got cleared and booting-related messages appeared
# !!! Then the log messages got cleared again, and the following final messages stayed in the logs:
# INFO[BL2]: start
# INFO[BL2]: boot mode (emmc)
# INFO[BL2]: DDR POST passed
# INFO[BL2]: UEFI loaded
# INFO[BL31]: start
# INFO[BL31]: lifecycle GA Non-Secured
# INFO[BL31]: runtime
# INFO[UEFI]: UPVS valid
# INFO[UEFI]: eMMC init
# INFO[UEFI]: eMMC probed
# INFO[UEFI]: PCIe enum start
# INFO[UEFI]: PCIe enum end
# INFO[UEFI]: UEFI Secure Boot (disabled)
# INFO[UEFI]: Redfish enabled
# INFO[UEFI]: DPU-BMC RF credentials not found
# WARN[UEFI]: UPVS reclaim start
# WARN[UEFI]: UPVS reclaim done
# INFO[UEFI]: exit Boot Service
# INFO[MISC]: Linux up
# INFO[MISC]: DPU is read
# Host: query information
sudo /opt/mellanox/doca/tools/doca-info
# Output:
# Versions:
# - MFT 4.30.1-1210
# - DOCA Base (OFED) MLNX_OFED_LINUX-24.10-3.2.5.0
# - DOCA <none>
#
# UEFI\ATF versions:
# - mst_device: mt41692_pciconf[0-9]
# UEFI Version: N\A
# ATF Version: N\A
#
# Firmware (Current):
# - BlueField-2
#
# SNAP3:
# - mlnx-libsnap NA (package not found)
# - mlnx-snap NA (package not found)
# - spdk NA (package not found)
#
# DOCA:
# - doca-all-userspace 2.9.3-0.2.2
# [...]
sudo mst status -v
# Output:
# MST modules:
# MST PCI module is not loaded
# MST PCI configuration module loaded
# PCI devices:
# DEVICE_TYPE MST PCI RDMA NET NUMA
# BlueField2(rev:1) /dev/mst/mt41686_pciconf0.1 01:00.1 -1
# BlueField2(rev:1) /dev/mst/mt41686_pciconf0 01:00.0 -1
sudo flint -d /dev/mst/mt41686_pciconf0 query
# Output:
# Image type: FS4
# FW Version: 24.32.2004
# FW Release Date: 13.1.2022
# Product Version: 24.32.2004
# Rom Info: type=UEFI Virtio net version=21.2.10 cpu=AMD64
# type=UEFI Virtio blk version=22.2.10 cpu=AMD64
# type=UEFI version=14.25.18 cpu=AMD64,AARCH64
# type=PXE version=3.6.502 cpu=AMD64
# Description: UID GuidsNumber
# Base GUID: 1070fd030086622a 12
# Base MAC: 1070fd86622a 12
# Image VSD: N/A
# Device VSD: N/A
# PSID: MT_0000000561
# Security Attributes: N/A
# sudo mlxfwmanager --query
# Output:
# Device Type: BlueField2
# Part Number: MBF2M516A-CEEO_Ax_Bx
# Description: BlueField-2 E-Series DPU 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL
# PSID: MT_0000000561
# PCI Device Name: /dev/mst/mt41686_pciconf0
# Base GUID: 1070fd030086622a
# Base MAC: 1070fd86622a
# Versions: Current Available
# FW 24.32.2004 N/A
# PXE 3.6.0502 N/A
# UEFI 14.25.0018 N/A
# UEFI Virtio blk 22.2.0010 N/A
# UEFI Virtio net 21.2.0010 N/A
# Status: No matching image found
# DPU: query information
sudo bf-info
# Output:
# dpkg-query: no packages found matching mlx-regex
# Versions:
# - ATF: v2.2(release):4.9.3-27-g82a4cdce5
# - UEFI: 4.9.3-22-g5c9f881c3f
# - BSP: 4.9.3.13692
# - NIC Firmware: 24.43.3608
# - Kernel: 5.15.0-1070-bluefield
# - DOCA Base (OFED): 24.10-3.2.5
# - MFT: 4.30.1-1210
# - mstflint: 4.29.0-1
# - mlnx-dpdk: 'MLNX_DPDK 22.11.2410.4.2'
# - mlx-regex:
# - collectx-clxapi: collectx-clxapi 1.19.1
# - libvma: libvma 9.8.60-1
# dpkg-query: no packages found matching libxlio
# -
# - dpcp 1.1.50-1.2410068
#
# Storage:
# - mlnx-libsnap 1.6.0-2
# - mlnx-snap 3.8.0-7
# - spdk 23.01.5-26
# - virtio-net-controller 24.10.45-1
#
# DOCA:
# - doca-caps 2.9.3008-1
# - doca-comm-channel-admin 2.9.3008-1
# - doca-runtime 1-2.9.3008-1.24.10.3.2.5.0.bf.4.9.3.13692
# [...]
I’m afraid I have ran out of ideas as to what else I could try. In a different forum post it was suggested to upgrade the firmware in small increment rather than upgrading to the latest version in a single hop. Unfortunately, I wasn’t able to do that either: I got the same results as the ones found at the top of this post, meaning the new firmware got uploaded, but the old firmware was the running/activate variant and I wasn’t able to execute the restart required to finish the upgrading.
I would greatly appreciate any assistance in this matter.