device: NVIDIA BlueField-3 B3210 P-Series FHHL DPU, 100GbE (default mode)
BlueField has been inaccessible since I rebooted the BlueField dpu (only dpu).
The following message occurs in the boot situation
[275604.216789] mlx5_core 0000:55:00.1: 63.008 Gb/s available PCIe bandwidth, limited by 8 GT/s x8 link at 0000:ae:00.0 (capable of 126.024 Gb/s with 16 GT/s x8 link)
[275624.187596] mlx5_core 0000:55:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 100s
[275644.152994] mlx5_core 0000:55:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 79s
[275664.118404] mlx5_core 0000:55:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 59s
[275684.083806] mlx5_core 0000:55:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 39s
[275704.049211] mlx5_core 0000:55:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 19s
[275723.954752] mlx5_core 0000:55:00.1: mlx5_function_setup:1237:(pid 943): Firmware over 120000 MS in pre-initializing state, aborting
[275723.968261] mlx5_core 0000:55:00.1: init_one:1813:(pid 943): mlx5_load_one failed with error code -16
[275723.978578] mlx5_core: probe of 0000:55:00.1 failed with error -16
So I’ve tried all the commands in the troubleshooting guide manual, but it doesn’t work.
- sudo mlxconfig -d /dev/mst/ -y reset
- sudo mlxconfig -d s LINK_TYPE_P1=2 LINK_TYPE_P2=2
When I checked the hardware connections with lshw it came up unclaimed
*-network:0 UNCLAIMED
description: Ethernet controller
product: MT43244 BlueField-3 integrated ConnectX-7 network controller
vendor: Mellanox Technologies
physical id: 0
bus info: pci@0000:55:00.0
version: 01
width: 64 bits
clock: 33MHz
capabilities: pciexpress vpd msix pm cap_list
configuration: latency=0
*-network:1 UNCLAIMED
description: Ethernet controller
product: MT43244 BlueField-3 integrated ConnectX-7 network controller
vendor: Mellanox Technologies
physical id: 0.1
bus info: pci@0000:55:00.1
version: 01
width: 64 bits
clock: 33MHz
capabilities: pciexpress vpd
I did mlxfwreset after reading a bf2 post about unclaimed, but it didn’t work properly.
(Ref. BF2 DPU shows "unclaimed")
host> sudo mlxfwreset -d /dev/mst/mt41692_pciconf0 -l 3 reset
Requested reset level for device, /dev/mst/mt41692_pciconf0:
3: Driver restart and PCI reset
Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw -Done
Arm OS shut down in progress, the completion of the process may take several minutes.
-E- The PCI link is still up even after the expected time (360.0) seconds has passed. Exiting the process..
+)
- Secure boot is disabled.
- Ubuntu 22.04
- DOCA Version is 2.5.0 (host, dpu)
How can I resolve the issue?