BlueField-2: firmware upgrade fails at restart, "Bad Parameter (265)"

Hello!

I’ve been trying to update a BlueField-2 DPU (PSID: MT_0000000561) and the DOCA libraries, unfortunately with no success. I’ve tried upgrading to various versions, e.g. the 3.1.0 LTS or 2.9.3 LTS. In all my attempts the firmware version of the DPU remained unchanged: it always stays on 24.32.2004.

Sometimes I was able to update the non-running firmware, but I was never able to execute a restart to actually finish the upgrading. For example, in the attempt below I received a “Failed to send Register MFRL: Bad parameter (265)” error message:

sudo mlxfwmanager -i fw-BlueField-2-rel-24_35_2000-MBF2M516A-CEEO_Ax_Bx-NVME-20.4.1-UEFI-21.4.10-UEFI-22.4.10-UEFI-14.28.16-FlexBoot-3.6.805.bin -u
# Output:
# [...]
#  Device Type:      BlueField2
#  Part Number:      MBF2M516A-CEEO_Ax_Bx
#  Description:      BlueField-2 E-Series DPU 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL
#  PSID:             MT_0000000561
#  PCI Device Name:  /dev/mst/mt41686_pciconf0
#  Base GUID:        1070fd030086622a
#  Base MAC:         1070fd86622a
#  Versions:         Current        Available
#     FW             24.32.2004     24.35.2000
# [...]
# Device #1: Updating FW ...
# FSMST_INITIALIZE -   OK
# Writing Boot image component -   OK
# Done
# Restart needed for updates to take effect.

sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 -y r
# Output:
# -E- Synchronization by driver is not supported in the current state of this device.

sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 q
# Reset-levels:
# 0: Driver, PCI link, network link will remain up ("live-Patch")  -Not Supported
# 1: Only ARM side will not remain up ("Immediate reset").         -Not Supported
# 3: Driver restart and PCI reset                                  -Supported     (default)
# 4: Warm Reboot                                                   -Supported
# Reset-types (relevant only for reset-levels 1,3,4):
# 0: Full chip reset                                               -Supported
# 1: Phy-less reset (keep network port active during reset)        -Not Supported
# 2: NIC only reset (for SoC devices)                              -Supported     (default)
# 3: ARM only reset                                                -Not Supported
# 4: ARM OS shut down                                              -Not Supported
# Reset-sync (relevant only for reset-level 3):
# 0: Tool is the owner                                             -Not supported
# 1: Driver is the owner                                           -Not supported (default)

gsarkozi@epyc4:~/fw-bf-bin$ sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 --level 4 r
# The reset level for device, /dev/mst/mt41686_pciconf0 is:
# 4: Warm Reboot
# Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset.
# Continue with reset?[y/N] y
# -I- Sending Reset Command To Fw             -Failed
# -E- Failed to send Register MFRL: Bad parameter (265).

I also tried (several times) to install DOCA 2.9.3 LTS from scratch on both the host and the DPU, following the official documentation. These attempts yielded no results either: the firmware version remained unchanged, as can be seen below:

# Host: query information
lsb_release -a  # Ubuntu 24.04.3 LTS
uname -r  # 6.8.0-87-generic

# Host: delete old versions
for f in $( dpkg --list | grep -E 'doca|flexio|dpa-gdbserver|dpa-stats|dpa-resource-mgmt|dpaeumgmt' | awk '{print $2}' ); do echo $f ; sudo apt remove --purge $f -y ; done
sudo /usr/sbin/ofed_uninstall.sh --force
sudo apt-get autoremove
sudo reboot

# Host: install new DOCA version via doca-kernel-support
wget https://www.mellanox.com/downloads/DOCA/DOCA_v2.9.3/host/doca-host_2.9.3-021000-24.10-ubuntu2404_amd64.deb
sudo dpkg -i doca-host_2.9.3-021000-24.10-ubuntu2404_amd64.deb
sudo apt-get update
sudo apt-get install -y doca-extra
sudo /opt/mellanox/doca/tools/doca-kernel-support # [...] You can install any of the other userspace packages (doca-all-userspace, doca-all-networking)
sudo dpkg --install /tmp/DOCA.bLTQs4xJA5/doca-kernel-repo-24.10-3.2.5.0-6.8.0.87.generic_24.10.3.2.5.0_amd64.deb
sudo apt-get update
sudo apt-get install -y doca-all-userspace doca-kernel-6.8.0.87.generic

# Host: initialize drivers, MST
sudo /etc/init.d/openibd restart
# Output: 
# Unloading HCA driver:                                      [  OK  ]
# Loading HCA driver and Access Layer:                       [  OK  ]
sudo mst restart
# Output:
# Stopping MST (Mellanox Software Tools) driver set
# Starting MST (Mellanox Software Tools) driver set
# Loading MST PCI module - Success
# Loading MST PCI configuration module - Success
# Create devices
# Unloading MST PCI module (unused) - Success

# Host: query information
sudo /opt/mellanox/doca/tools/doca-info
# Output:
# Versions:
# - MFT 4.30.1-1210
# - DOCA Base (OFED) MLNX_OFED_LINUX-24.10-3.2.5.0
# - DOCA <none>
# 
# UEFI\ATF versions:
# - mst_device: mt41692_pciconf[0-9]
#      UEFI Version: N\A
#      ATF Version: N\A
# 
# Firmware (Current):
# - BlueField-2
# 
# SNAP3:
# - mlnx-libsnap NA (package not found)
# - mlnx-snap NA (package not found)
# - spdk NA (package not found)
# 
# DOCA:
# - doca-all-userspace 2.9.3-0.2.2
# - doca-bench 2.9.3008-1
# - doca-caps 2.9.3008-1
# - doca-comm-channel-admin 2.9.3008-1
# - doca-devel 2.9.3-0.2.2
# - doca-extra 0.1.7-1
# - doca-host 2.9.3-021000-24.10-ubuntu2404
# [...]
sudo mst status -v
# Output:
# MST modules:
#     MST PCI module is not loaded
#     MST PCI configuration module loaded
# PCI devices:
# DEVICE_TYPE             MST                           PCI       RDMA            NET                                     NUMA
# BlueField2(rev:1)       /dev/mst/mt41686_pciconf0.1   01:00.1                                           -1
# BlueField2(rev:1)       /dev/mst/mt41686_pciconf0     01:00.0                                           -1
sudo flint -d /dev/mst/mt41686_pciconf0 query
# Output:
# Image type:            FS4
# FW Version:            24.35.2000
# FW Version(Running):   24.32.2004
# FW Release Date:       24.11.2022
# Product Version:       24.32.2004
# Rom Info:              type=UEFI Virtio net version=21.2.10 cpu=AMD64
#                        type=UEFI Virtio blk version=22.2.10 cpu=AMD64
#                        type=UEFI version=14.25.18 cpu=AMD64,AARCH64
#                        type=PXE version=3.6.502 cpu=AMD64
# Description:           UID                GuidsNumber
# Base GUID:             1070fd030086622a        12
# Base MAC:              1070fd86622a            12
# Image VSD:             N/A
# Device VSD:            N/A
# PSID:                  MT_0000000561
# Security Attributes:   N/A
sudo mlxfwmanager --query
# Output:
# Device Type:      BlueField2
# Part Number:      MBF2M516A-CEEO_Ax_Bx
# Description:      BlueField-2 E-Series DPU 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL
# PSID:             MT_0000000561
# PCI Device Name:  /dev/mst/mt41686_pciconf0
# Base GUID:        1070fd030086622a
# Base MAC:         1070fd86622a
# Versions:         Current        Available
#    FW             24.35.2000     N/A
#    FW (Running)   24.32.2004     N/A
#    PXE            3.6.0502       N/A
#    UEFI           14.25.0018     N/A
#    UEFI Virtio blk   22.2.0010      N/A
#    UEFI Virtio net   21.2.0010      N/A
# Status:           No matching image found

# Host: update DPU
ls -la /dev/ | grep rshim  # No output
systemctl restart rshim
ls -la /dev/ | grep rshim  # rshim0
sudo bfb-install --rshim rshim0 --bfb bf-bundle-2.9.3-32_25.06_ubuntu-22.04_prod.bfb
# Output: (notice "Reactivating previous firmware image" and "update done: 24.43.3608")
# Checking if local host has root access...
# Checking if rshim driver is running locally...
# Pushing bfb
# Collecting BlueField booting status. Press Ctrl+C to stop…
#  INFO[BL2]: start
#  INFO[BL2]: boot mode (rshim)
#  INFO[BL2]: DDR POST passed
#  INFO[BL2]: UEFI loaded
#  INFO[BL31]: start
#  INFO[BL31]: lifecycle GA Non-Secured
#  INFO[BL31]: runtime
#  INFO[UEFI]: UPVS valid
#  INFO[UEFI]: eMMC init
#  INFO[UEFI]: eMMC probed
#  INFO[UEFI]: PMI: updates started
#  INFO[UEFI]: PMI: total updates: 1
#  INFO[UEFI]: PMI: updates completed, status 0
#  INFO[UEFI]: PCIe enum start
#  INFO[UEFI]: PCIe enum end
#  INFO[UEFI]: UEFI Secure Boot (disabled)
#  INFO[UEFI]: Redfish enabled
#  INFO[UEFI]: exit Boot Service
#  INFO[MISC]: Erasing eMMC drive: /dev/mmcblk0
#  INFO[MISC]: Ubuntu installation started
#  INFO[MISC]: Installing OS image
#  INFO[MISC]: Ubuntu installation completed
#  INFO[MISC]: Resetting BMC Rshim log
#  INFO[MISC]: Rshim log cleared
#  WARN[MISC]: Skipping BMC components upgrade.
#  INFO[MISC]: Reactivating previous firmware image on the NIC
#  INFO[MISC]: Updating NIC firmware...
#  INFO[MISC]: NIC firmware update done: 24.43.3608
#  INFO[MISC]: Installation finished
sudo cat /dev/rshim0/misc
# Output:
# DISPLAY_LEVEL   2 (0:basic, 1:advanced, 2:log)
# BOOT_MODE       1 (0:rshim, 1:emmc, 2:emmc-boot-swap)
# BOOT_TIMEOUT    300 (seconds)
# USB_TIMEOUT     40 (seconds)
# DROP_MODE       0 (0:normal, 1:drop)
# SW_RESET        0 (1: reset)
# DEV_NAME        pcie-0000:01:00.2
# DEV_INFO        BlueField-2(Rev 1)
# OPN_STR         MBF2M516A-CEEO
# FORCE_CMD       0 (1: send Force command)
#              Log Messages
# !!! THE FOLLOWING APPEARED FOR A FEW SECONDS !!!
# Something similar to: NIC (?) reset not supported, power cycle required
# Then the log messages got cleared and booting-related messages appeared
# !!! Then the log messages got cleared again, and the following final messages stayed in the logs:
# INFO[BL2]: start
# INFO[BL2]: boot mode (emmc)
# INFO[BL2]: DDR POST passed
# INFO[BL2]: UEFI loaded
# INFO[BL31]: start
# INFO[BL31]: lifecycle GA Non-Secured
# INFO[BL31]: runtime
# INFO[UEFI]: UPVS valid
# INFO[UEFI]: eMMC init
# INFO[UEFI]: eMMC probed
# INFO[UEFI]: PCIe enum start
# INFO[UEFI]: PCIe enum end
# INFO[UEFI]: UEFI Secure Boot (disabled)
# INFO[UEFI]: Redfish enabled
# INFO[UEFI]: DPU-BMC RF credentials not found
# WARN[UEFI]: UPVS reclaim start
# WARN[UEFI]: UPVS reclaim done
# INFO[UEFI]: exit Boot Service
# INFO[MISC]: Linux up
# INFO[MISC]: DPU is read

# Host: query information
sudo /opt/mellanox/doca/tools/doca-info
# Output:
# Versions:
# - MFT 4.30.1-1210
# - DOCA Base (OFED) MLNX_OFED_LINUX-24.10-3.2.5.0
# - DOCA <none>
# 
# UEFI\ATF versions:
# - mst_device: mt41692_pciconf[0-9]
#      UEFI Version: N\A
#      ATF Version: N\A
# 
# Firmware (Current):
# - BlueField-2
# 
# SNAP3:
# - mlnx-libsnap NA (package not found)
# - mlnx-snap NA (package not found)
# - spdk NA (package not found)
# 
# DOCA:
# - doca-all-userspace 2.9.3-0.2.2
# [...]
sudo mst status -v
# Output:
# MST modules:
#     MST PCI module is not loaded
#     MST PCI configuration module loaded
# PCI devices:
# DEVICE_TYPE             MST                           PCI       RDMA            NET                                     NUMA
# BlueField2(rev:1)       /dev/mst/mt41686_pciconf0.1   01:00.1                                           -1
# BlueField2(rev:1)       /dev/mst/mt41686_pciconf0     01:00.0                                           -1
sudo flint -d /dev/mst/mt41686_pciconf0 query
# Output:
# Image type:            FS4
# FW Version:            24.32.2004
# FW Release Date:       13.1.2022
# Product Version:       24.32.2004
# Rom Info:              type=UEFI Virtio net version=21.2.10 cpu=AMD64
#                        type=UEFI Virtio blk version=22.2.10 cpu=AMD64
#                        type=UEFI version=14.25.18 cpu=AMD64,AARCH64
#                        type=PXE version=3.6.502 cpu=AMD64
# Description:           UID                GuidsNumber
# Base GUID:             1070fd030086622a        12
# Base MAC:              1070fd86622a            12
# Image VSD:             N/A
# Device VSD:            N/A
# PSID:                  MT_0000000561
# Security Attributes:   N/A
# sudo mlxfwmanager --query
# Output:
# Device Type:      BlueField2
# Part Number:      MBF2M516A-CEEO_Ax_Bx
# Description:      BlueField-2 E-Series DPU 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL
# PSID:             MT_0000000561
# PCI Device Name:  /dev/mst/mt41686_pciconf0
# Base GUID:        1070fd030086622a
# Base MAC:         1070fd86622a
# Versions:         Current        Available
#    FW             24.32.2004     N/A
#    PXE            3.6.0502       N/A
#    UEFI           14.25.0018     N/A
#    UEFI Virtio blk   22.2.0010      N/A
#    UEFI Virtio net   21.2.0010      N/A
# Status:           No matching image found

# DPU: query information
sudo bf-info
# Output:
# dpkg-query: no packages found matching mlx-regex
# Versions:
# - ATF: v2.2(release):4.9.3-27-g82a4cdce5
# - UEFI: 4.9.3-22-g5c9f881c3f
# - BSP: 4.9.3.13692
# - NIC Firmware: 24.43.3608
# - Kernel: 5.15.0-1070-bluefield
# - DOCA Base (OFED): 24.10-3.2.5
# - MFT: 4.30.1-1210
# - mstflint: 4.29.0-1
# - mlnx-dpdk:  'MLNX_DPDK 22.11.2410.4.2'
# - mlx-regex:
# - collectx-clxapi: collectx-clxapi 1.19.1
# - libvma: libvma 9.8.60-1
# dpkg-query: no packages found matching libxlio
# -
# - dpcp 1.1.50-1.2410068
# 
# Storage:
# - mlnx-libsnap 1.6.0-2
# - mlnx-snap 3.8.0-7
# - spdk 23.01.5-26
# - virtio-net-controller 24.10.45-1
# 
# DOCA:
# - doca-caps 2.9.3008-1
# - doca-comm-channel-admin 2.9.3008-1
# - doca-runtime 1-2.9.3008-1.24.10.3.2.5.0.bf.4.9.3.13692
# [...]

I’m afraid I have ran out of ideas as to what else I could try. In a different forum post it was suggested to upgrade the firmware in small increment rather than upgrading to the latest version in a single hop. Unfortunately, I wasn’t able to do that either: I got the same results as the ones found at the top of this post, meaning the new firmware got uploaded, but the old firmware was the running/activate variant and I wasn’t able to execute the restart required to finish the upgrading.

I would greatly appreciate any assistance in this matter.

Hi @_Trigary

Download NVIDIA DOCA 3.1.0 Downloads | NVIDIA Developer

Installation Instructions:
  1. Make sure to install host drivers - DOCA-Host drivers

  2. Run the following command:

    bfb-install --bfb bf-bundle-3.1.0-76_25.07_ubuntu-22.04_prod.bfb --rshim rshim0
    
  3. To update BF-Bundle with latest BF-FW-Bundle follow these steps

remember your BF needs to be in DPU mode.

Thanks

Jose

Hi Jose! Thank you very much for your reply.

Regarding the BF being in DPU mode: I am unable to query/set its mode, because the INTERNAL_CPU_OFFLOAD_ENGINE option doesn’t seem to exist:

sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -e q | grep INTERNAL
# Output:
# INTERNAL_CPU_MODEL        EMBEDDED_CPU(1)      EMBEDDED_CPU(1)      EMBEDDED_CPU(1)

I also tried installing the “BF-FW-BUNDLE” rather than the “BF-BUNDLE” as you suggested. I chose the 2.9.3 version, as that’s the version currently installed on the host. Here are the logs from my attempt:

sudo bfb-install --bfb bf-fwbundle-2.9.3-32_25.06-prod.bfb --rshim rshim0
# Output:
# Checking if local host has root access...
# Checking if rshim driver is running locally...
# Pushing bfb
# Collecting BlueField booting status. Press Ctrl+C to stop…
# INFO[BL2]: start
# INFO[BL2]: boot mode (rshim)
# INFO[BL2]: DDR POST passed
# INFO[BL2]: UEFI loaded
# INFO[BL31]: start
# INFO[BL31]: lifecycle GA Non-Secured
# INFO[BL31]: runtime
# INFO[UEFI]: UPVS valid
# INFO[UEFI]: eMMC init
# INFO[UEFI]: eMMC probed
# INFO[UEFI]: PMI: updates started
# INFO[UEFI]: PMI: total updates: 1
# INFO[UEFI]: PMI: updates completed, status 0
# INFO[UEFI]: PCIe enum start
# INFO[UEFI]: PCIe enum end
# INFO[UEFI]: UEFI Secure Boot (disabled)
# INFO[UEFI]: Redfish enabled
# INFO[UEFI]: exit Boot Service
# INFO[MISC]: Resetting BMC Rshim log
# INFO[MISC]: Rshim log cleared
# WARN[MISC]: Skipping BMC components upgrade.
# INFO[MISC]: Updating NIC firmware...
# INFO[MISC]: NIC firmware update done: 24.43.3608
# INFO[MISC]: Installation finished

# I had to wait for ~10 minutes before the BF was ready: /dev/rshim0/misc was empty for a long time
# I suspect the BF booted multiple times, as once again I briefly saw a message about
# the firmware version being unsupported and a power cycle being required.
# After the BF booted and became accessible again:

sudo flint -d /dev/mst/mt41686_pciconf0 q
# Output:
# Image type:            FS4
# FW Version:            24.32.2004
# FW Release Date:       13.1.2022
# Product Version:       24.32.2004

As before, while the output of the bfb-install command suggests that the update was successful, the firmware didn’t actually get updated on the BF. I suspect that when the BF boots after an update, its firmware gets reverted to the factory default version due to some issue.

It also worries me that I can’t query whether or not DPU mode is enabled.

please run this command :

systemctl status rshim

then

screen /dev/rshim0/console 115200

share the output

Here you go!

The output of sudo systemctl status rshim: (note the “rshim0 failed to enable INTx” line)

rshim.service - rshim driver for BlueField SoC
     Loaded: loaded (/usr/lib/systemd/system/rshim.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-11-05 13:17:58 CET; 11h ago
       Docs: man:rshim(8)
    Process: 2060 ExecStart=/usr/sbin/rshim $OPTIONS (code=exited, status=0/SUCCESS)
   Main PID: 2071 (rshim)
      Tasks: 8 (limit: 134645)
     Memory: 1.7M (peak: 3.0M)
        CPU: 11min 1.995s
     CGroup: /system.slice/rshim.service
             └─2071 /usr/sbin/rshim

Nov 05 13:17:58 epyc4 (rshim)[2060]: rshim.service: Referenced but unset environment variable evaluates to an empty string: OPTIONS
Nov 05 13:17:58 epyc4 rshim[2071]: Created PID file: /var/run/rshim.pid
Nov 05 13:17:58 epyc4 systemd[1]: Started rshim.service - rshim driver for BlueField SoC.
Nov 05 13:17:58 epyc4 rshim[2071]: Probing pcie-0000:01:00.2(vfio)
Nov 05 13:17:58 epyc4 rshim[2071]: Create rshim pcie-0000:01:00.2
Nov 05 13:17:58 epyc4 rshim[2071]: rshim0 failed to enable INTx
Nov 05 13:17:58 epyc4 rshim[2071]: pcie-0000:01:00.2 enable
Nov 05 13:17:59 epyc4 rshim[2071]: rshim0 attached
Nov 05 17:14:57 epyc4 rshim[2071]: rshim0 boot open
Nov 05 17:16:18 epyc4 rshim[2071]: rshim0 boot close

When I execute the sudo screen /dev/rshim0/console 115200, I am greeted by an empty screen. Pressing enter provided the following prompt, which I could use to successfully log into the BF, just like I can via SSH.

Ubuntu 22.04.5 LTS localhost.localdomain hvc0
localhost login:

username: ubuntu

password: ubuntu

You have upgraded your BF :)

Thank you very much, you are right! Running bf-info on the BF indeed displays an updated framework version. On the other hand, quite a few details still worry me and I fear the installation is not fully functional:

  • Other commands (mlxfwmanager, flint) still show the old firmware version. Even if I execute the command on the BF and even if I execute it on the host. bf-info is the only command that shows the new firmware version.
  • The INTERNAL_CPU_OFFLOAD_ENGINE option still doesn’t seem to exist, which suggest a serious issue, as I can’t set whether DPU or NIC mode should be active. (Error: -E- The Device doesn't support INTERNAL_CPU_OFFLOAD_ENGINE parameter)
  • The dmesg logs on both the host and the BF contain entries suggesting issues, see below.
  • The dmesg logs on the BF claim that the old firmware version is being loaded.

Host dmesg logs:

host$ sudo dmesg | grep -iE "mlx|mlnx|nvidia|bluefield"
[    2.683276] mlx5_core 0000:01:00.0: firmware version: 24.32.2004
[    2.683992] mlx5_core 0000:01:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[   22.688141] mlx5_core 0000:01:00.0: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 100s (0x87010000)
[   42.694142] mlx5_core 0000:01:00.0: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 79s (0x87010000)
[   62.699141] mlx5_core 0000:01:00.0: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 59s (0x87010000)
[   82.701140] mlx5_core 0000:01:00.0: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 39s (0x87010000)
[  102.706139] mlx5_core 0000:01:00.0: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 19s (0x87010000)
[  122.687037] mlx5_core 0000:01:00.0: mlx5_function_enable:1156:(pid 497): Firmware over 120000 MS in pre-initializing state, aborting
[  122.689498] mlx5_core 0000:01:00.0: probe_one:1962:(pid 497): mlx5_init_one failed with error code -16
[  122.710173] mlx5_core: probe of 0000:01:00.0 failed with error -16
[  122.724043] mlx5_core 0000:01:00.1: firmware version: 24.32.2004
[  122.724805] mlx5_core 0000:01:00.1: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[  142.727137] mlx5_core 0000:01:00.1: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 100s (0x87010000)
[  162.733135] mlx5_core 0000:01:00.1: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 79s (0x87010000)
[  182.738135] mlx5_core 0000:01:00.1: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 59s (0x87010000)
[  202.743134] mlx5_core 0000:01:00.1: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 39s (0x87010000)
[  222.748132] mlx5_core 0000:01:00.1: wait_fw_init:207:(pid 497): Waiting for FW initialization, timeout abort in 19s (0x87010000)
[  242.729132] mlx5_core 0000:01:00.1: mlx5_function_enable:1156:(pid 497): Firmware over 120000 MS in pre-initializing state, aborting
[  242.730564] mlx5_core 0000:01:00.1: probe_one:1962:(pid 497): mlx5_init_one failed with error code -16
[  242.751545] mlx5_core: probe of 0000:01:00.1 failed with error -16
[  245.337286] mlx_compat: loading out-of-tree module taints kernel.
[  245.337298] mlx_compat: module verification failed: signature and/or required key missing - tainting kernel
[  245.338084] Compat-mlnx-ofed backport release: 8ad7fe3
[  245.338087] Backport based on https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git 8ad7fe3
[  245.338088] compat.git: https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git
[  427.402371] Compat-mlnx-ofed backport release: 8ad7fe3
[  427.402378] Backport based on https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git 8ad7fe3
[  427.402380] compat.git: https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git
[  427.810689] mlx5_core 0000:01:00.0: firmware version: 24.32.2004
[  427.810728] mlx5_core 0000:01:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[  447.811491] mlx5_core 0000:01:00.0: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 100s (0x87010000)
[  467.814953] mlx5_core 0000:01:00.0: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 79s (0x87010000)
[  487.818483] mlx5_core 0000:01:00.0: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 59s (0x87010000)
[  507.820418] mlx5_core 0000:01:00.0: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 39s (0x87010000)
[  527.822570] mlx5_core 0000:01:00.0: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 19s (0x87010000)
[  547.805054] mlx5_core 0000:01:00.0: wait_fw_init:370:(pid 4946): Firmware over 120000 MS in pre-initializing state, aborting
[  547.805134] mlx5_core 0000:01:00.0: probe_one:2494:(pid 4946): mlx5_init_one failed with error code -110
[  547.816538] mlx5_core: probe of 0000:01:00.0 failed with error -110
[  547.826014] mlx5_core 0000:01:00.1: firmware version: 24.32.2004
[  547.826053] mlx5_core 0000:01:00.1: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[  567.826777] mlx5_core 0000:01:00.1: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 100s (0x87010000)
[  587.829678] mlx5_core 0000:01:00.1: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 79s (0x87010000)
[  607.832707] mlx5_core 0000:01:00.1: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 59s (0x87010000)
[  627.835830] mlx5_core 0000:01:00.1: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 39s (0x87010000)
[  647.839021] mlx5_core 0000:01:00.1: wait_fw_init:380:(pid 4946): Waiting for FW pre-initializing, timeout abort in 19s (0x87010000)
[  667.821264] mlx5_core 0000:01:00.1: wait_fw_init:370:(pid 4946): Firmware over 120000 MS in pre-initializing state, aborting
[  667.821342] mlx5_core 0000:01:00.1: probe_one:2494:(pid 4946): mlx5_init_one failed with error code -110
[  667.833234] mlx5_core: probe of 0000:01:00.1 failed with error -110

BF dmesg logs:

bf$ sudo dmesg | grep -iE "mlx|mlnx|nvidia|bluefield"
[    0.000000] Linux version 5.15.0-1070-bluefield (buildd@bos03-arm64-094) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #72-Ubuntu SMP Fri Jun 20 10:43:28 UTC 2025 (Ubuntu 5.15.0-1070.72-bluefield 5.15.180)
[    0.000000] ACPI: RSDP 0x00000000FFFF0018 000024 (v02 MLNXT.)
[...]
[    0.000000] ACPI: SSDT 0x00000000FFFFE318 000437 (v01 MLNXT. MLX-BF22 20170213 INTL 20170303)
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1070-bluefield root=UUID=19bac42f-ad27-4133-90e9-1dac1251f3a1 ro console=hvc0 console=ttyAMA0 earlycon=pl011,0x01000000 fixrtc net.ifnames=0 biosdevname=0 iommu.passthrough=1
[    0.000000] Unknown kernel command line parameters "fixrtc BOOT_IMAGE=/boot/vmlinuz-5.15.0-1070-bluefield biosdevname=0", will be passed to user space.
[    0.395138] DMI: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.9.3.13692 Jun 27 2025
[    2.053433] integrity: Loaded X.509 cert 'NVIDIA BlueField Secure Boot UEFI db Signing 2021: f3ebdcf5e3ef589c8fcc8883f8864c29e2884306'
[    2.071201] integrity: Loaded X.509 cert 'NVIDIA BlueField Secure Boot EFI Signing 2022-A: 7eaf3adfa51d32346583b219684c21b3cdbff7b4'
[    2.354624]     BOOT_IMAGE=/boot/vmlinuz-5.15.0-1070-bluefield
[    3.564807] mlxbf2_gpio MLNXBF22:01: IRQ index 0 not found
[    3.570836] mlxbf2_gpio MLNXBF22:02: IRQ index 0 not found
[    3.590885] dwmmc_bluefield PRP0001:00: IDMAC supports 64-bit address mode.
[    3.590903] dwmmc_bluefield PRP0001:00: Using internal DMA controller.
[    3.590909] dwmmc_bluefield PRP0001:00: Version ID is 270a
[    3.590946] dwmmc_bluefield PRP0001:00: DW MMC controller at irq 22,32 bit host data width,256 deep fifo
[    3.670516] Micrel KSZ9031 Gigabit PHY MLNXBF17:00:03: attached PHY driver (mii_bus:phy_addr=MLNXBF17:00:03, irq=77)
[    3.678010] mlx_compat: loading out-of-tree module taints kernel.
[    3.678588] Compat-mlnx-ofed backport release: 8ad7fe3
[    3.678594] Backport based on https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git 8ad7fe3
[    3.678596] compat.git: https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git
[    3.780539] mlx5_core 0000:03:00.0: firmware version: 24.32.2004
[    3.780601] mlx5_core 0000:03:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[    4.133086] mlx5_core 0000:03:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    4.134401] mlx5_core 0000:03:00.0: E-Switch: Total vports 83, per vport: max uc(128) max mc(2048)
[    4.143449] mlx5_core 0000:03:00.0: Port module event: module 0, Cable plugged
[    4.143632] mlx5_core 0000:03:00.0: mlx5_pcie_event:295:(pid 235): PCIe slot power capability was not advertised.
[    4.155680] mlx5_core 0000:03:00.0: mlx5e: IPSec ESP acceleration enabled
[    4.155958] mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[    4.325420] mlx5_core 0000:03:00.0: mlx5e_accel_ipsec_fs_init:2342:(pid 335): IPsec was initialized without RoCE support
[    4.341180] mlx5_core 0000:03:00.1: firmware version: 24.32.2004
[    4.341263] mlx5_core 0000:03:00.1: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
[    4.734050] mlx5_core 0000:03:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    4.735603] mlx5_core 0000:03:00.1: E-Switch: Total vports 83, per vport: max uc(128) max mc(2048)
[    4.745576] mlx5_core 0000:03:00.1: Port module event: module 1, Cable plugged
[    4.745966] mlx5_core 0000:03:00.1: mlx5_pcie_event:295:(pid 131): PCIe slot power capability was not advertised.
[    4.758756] mlx5_core 0000:03:00.1: mlx5e: IPSec ESP acceleration enabled
[    4.759041] mlx5_core 0000:03:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[    4.937728] mlx5_core 0000:03:00.1: mlx5e_accel_ipsec_fs_init:2342:(pid 335): IPsec was initialized without RoCE support
[    9.900523] EDAC MC0: Giving out device to module bluefield-edac controller BlueField_Memory_Controller: DEV MLNXBF08:00 (POLLED)
[    9.908232] mlx-trio MLNXBF06:00: v0.4 probed
[    9.908673] mlx-trio MLNXBF06:01: Device 02:02.0 not found
[    9.908681] mlx-trio MLNXBF06:01: v0.4 probed
[    9.968732] mlxbf_gige MLNXBF17:00 oob_net0: renamed from eth0
[   10.141585] PKA_DRIVER: device MLNXBF20:00 probed
[   10.347157] PKA_DRIVER: device MLNXBF20:01 probed
[   10.549854] PKA_DRIVER: device MLNXBF20:02 probed
[   10.752929] PKA_DRIVER: device MLNXBF20:03 probed
[   10.955939] PKA_DRIVER: device MLNXBF20:04 probed
[   10.960190] mlx5_core 0000:03:00.0 p0: renamed from eth2
[   11.158674] PKA_DRIVER: device MLNXBF20:05 probed
[   11.361918] PKA_DRIVER: device MLNXBF20:06 probed
[   11.371728] mlx5_core 0000:03:00.1 p1: renamed from eth3
[   11.490118] audit: type=1400 audit(1762446453.348:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=811 comm="apparmor_parser"
[   11.490133] audit: type=1400 audit(1762446453.348:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=811 comm="apparmor_parser"
[   11.564612] PKA_DRIVER: device MLNXBF20:07 probed
[   11.872573] mlx5_core 0000:03:00.0 p0: Link up
[   11.960159] mlx5_core 0000:03:00.1 p1: Link up
[   14.075420] mlx5_core 0000:03:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[   15.701963] mlx5_core 0000:03:00.0: mlx5_cmd_out_err:835:(pid 1129): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x201c1c), err(-22)
[   15.717242] mlx5_core 0000:03:00.0: mlx5_rdma_enable_roce_steering:71:(pid 1129): Failed to create RDMA RX flow group err(-22)
[   15.731683] mlx5_core 0000:03:00.0: mlx5_rdma_enable_roce:164:(pid 1129): Failed to enable RoCE steering: -22
[   15.793920] mlx5_core 0000:03:00.0: esw_compat_write:385:(pid 1129): mlx5_core: Failed setting eswitch to offloads
[   16.811272] mlx5_core 0000:03:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[   18.132188] mlx5_core 0000:03:00.0: mlx5_cmd_out_err:835:(pid 1129): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x201c1c), err(-22)
[   18.147462] mlx5_core 0000:03:00.0: mlx5_rdma_enable_roce_steering:71:(pid 1129): Failed to create RDMA RX flow group err(-22)
[   18.161754] mlx5_core 0000:03:00.0: mlx5_rdma_enable_roce:164:(pid 1129): Failed to enable RoCE steering: -22
[   18.215191] mlx5_core 0000:03:00.0: esw_compat_write:385:(pid 1129): mlx5_core: Failed setting eswitch to offloads
[   19.231271] mlx5_core 0000:03:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[   20.551693] mlx5_core 0000:03:00.0: mlx5_cmd_out_err:835:(pid 1129): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x201c1c), err(-22)
[   20.566961] mlx5_core 0000:03:00.0: mlx5_rdma_enable_roce_steering:71:(pid 1129): Failed to create RDMA RX flow group err(-22)
[   20.581227] mlx5_core 0000:03:00.0: mlx5_rdma_enable_roce:164:(pid 1129): Failed to enable RoCE steering: -22
[   20.630487] mlx5_core 0000:03:00.0: esw_compat_write:385:(pid 1129): mlx5_core: Failed setting eswitch to offloads

[...]

[  191.535273] mlx5_core 0000:03:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[  192.855409] mlx5_core 0000:03:00.1: mlx5_cmd_out_err:835:(pid 1129): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x201c1c), err(-22)
[  192.870681] mlx5_core 0000:03:00.1: mlx5_rdma_enable_roce_steering:71:(pid 1129): Failed to create RDMA RX flow group err(-22)
[  192.884953] mlx5_core 0000:03:00.1: mlx5_rdma_enable_roce:164:(pid 1129): Failed to enable RoCE steering: -22
[  192.930457] mlx5_core 0000:03:00.1: esw_compat_write:385:(pid 1129): mlx5_core: Failed setting eswitch to offloads
[  196.737977] mlxbf_gige MLNXBF17:00: open: start state tx_ci=0x0 tx_pi=0x0 rx_ci=0x0 rx_pi=0x0 int_mask=0x1
[  196.737993] mlxbf_gige MLNXBF17:00:   din_drop=0x0 rx_dma=0x4000000 rx_fifo=0x0 rx_polarity=0
[  196.740882] mlxbf_gige MLNXBF17:00: open: after phy_start tx_ci=0x0 tx_pi=0x0 rx_ci=0x0 rx_pi=0x0 int_mask=0x1
[  196.740895] mlxbf_gige MLNXBF17:00:   din_drop=0x0 rx_dma=0x4000000 rx_fifo=0x0 rx_polarity=0
[  196.740912] mlxbf_gige MLNXBF17:00: open: after tx_init tx_ci=0x0 tx_pi=0x0 rx_ci=0x0 rx_pi=0x0 int_mask=0x1
[  196.740916] mlxbf_gige MLNXBF17:00:   din_drop=0x0 rx_dma=0x4000000 rx_fifo=0x0 rx_polarity=0
[  196.741273] mlxbf_gige MLNXBF17:00 oob_net0: Link is Down
[  196.741375] mlxbf_gige MLNXBF17:00: open: after rx_init tx_ci=0x0 tx_pi=0x0 rx_ci=0x0 rx_pi=0x80 int_mask=0x0
[  196.741383] mlxbf_gige MLNXBF17:00:   din_drop=0x0 rx_dma=0x4000001 rx_fifo=0x300000000 rx_polarity=0
[  196.741391] mlxbf_gige MLNXBF17:00: open: after napi tx_ci=0x0 tx_pi=0x0 rx_ci=0x0 rx_pi=0x80 int_mask=0x0
[  196.741395] mlxbf_gige MLNXBF17:00:   din_drop=0x0 rx_dma=0x4000001 rx_fifo=0x300000000 rx_polarity=0
[  196.741465] mlxbf_gige MLNXBF17:00: open: end state tx_ci=0x0 tx_pi=0x0 rx_ci=0x0 rx_pi=0x80 int_mask=0x0
[  196.741471] mlxbf_gige MLNXBF17:00:   din_drop=0x0 rx_dma=0x4000001 rx_fifo=0x300000000 rx_polarity=0
[  197.751305] mlxbf_gige MLNXBF17:00: phy_task: autoneg pending, timeout=15
[  198.775299] mlxbf_gige MLNXBF17:00: phy_task: autoneg pending, timeout=14
[  199.803329] mlxbf_gige MLNXBF17:00: phy_task: autoneg pending, timeout=13
[...]
[  210.039301] mlxbf_gige MLNXBF17:00: phy_task: autoneg pending, timeout=3
[  211.063303] mlxbf_gige MLNXBF17:00: phy_task: autoneg pending, timeout=2
[  212.087308] mlxbf_gige MLNXBF17:00: phy_task: autoneg pending, timeout=1
[  213.111337] mlxbf_gige MLNXBF17:00: phy_task: autoneg pending, timeout=0
[  213.111411] mlxbf_gige MLNXBF17:00: phy_task: restarting autoneg, status=0x0

This is running in host or DPU ?

I’m sorry, which command do you mean? For most commands I marked whether it was executed on the host or on the BF (DPU).

bf-info can only be executed on the DPU and it showed the updated firmware version. Both mlxfwmanager and flint showed the old firmware version when executed on the DPU, and also when executed on the host. I got the same error regarding INTERNAL_CPU_OFFLOAD_ENGINE on both the DPU and the host.

You have DOCA 3.1.0 with MLXN drivers compiled with the kernel, right?

I think your system is fine — it’s running in DPU mode. Regarding the firmware on the host, you need to update it and configure your Linux system with the correct parameters.

This case was for upgrading the BlueField (BF), and that has already been done.

I’ve followed the official DOCA Host 2.9.3 installation steps, including installing doca-all through doca-kernel-support. The logs can be seen in the first post of this thread.

The host should already be updated, as I have executed the necessary commands. But I don’t think the DPU was upgraded, as the following logs taken from the DPU suggest:

ubuntu@localhost:~$ sudo bf-info | grep "Firmware"
- NIC Firmware: 24.43.3608
ubuntu@localhost:~$ sudo dmesg | grep "firmware version"
[    3.780539] mlx5_core 0000:03:00.0: firmware version: 24.32.2004
[    4.341180] mlx5_core 0000:03:00.1: firmware version: 24.32.2004

This is version 3.1 and your DPU is updated

BlueField SW bundle supporting BlueField-3 & BlueField-2, including DOCA 3.1.0, DPU-OS Ubuntu 22.04, BSP 4.12.0.13720, NIC-FW BF2 24.46.1006, BF3 32.46.1006, BF3 BMC-FW 25.07, BF3 BMC-eROT 00.02.0195.0000, BF2 BMC-FW 25.07, BF2 BMC-eROT 04.0f

About your host you need upgrade your version.

when you run bfb in host for upgrade BF only update card not your host.

If you need upgrade your host

Installation Instructions:
export DOCA_URL="https://linux.mellanox.com/public/repo/doca/3.1.0/ubuntu24.04/x86_64/"BASE_URL=$([ "${DOCA_PREPUBLISH:-false}" = "true" ] && echo https://doca-repo-prod.nvidia.com/public/repo/doca || echo https://linux.mellanox.com/public/repo/doca)DOCA_SUFFIX=${DOCA_URL#*public/repo/doca/}; DOCA_URL="$BASE_URL/$DOCA_SUFFIX"curl $BASE_URL/GPG-KEY-Mellanox.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pubecho "deb [signed-by=/etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub] $DOCA_URL ./" > /etc/apt/sources.list.d/doca.listsudo apt-get updatesudo apt-get -y install doca-all

Thank you very much for your persistent replies! I have upgraded the host as per your instructions (first I uninstalled the DOCA packages and then I executed the commands you sent). Unfortunately, that changed nothing:

  • All commands (except bf-info) still show the old firmware version
  • The INTERNAL_CPU_OFFLOAD_ENGINE option still doesn’t exist
  • dmesg logs on both the host and the BlueField contain records that indicate issues (the logs can be found in my previous replies)