Hi everyone,
I’m encountering an issue while trying to install/upgrade DOCA on a BlueField-2 DPU using the bfb-install
tool, and I’m hoping someone might have some insights.
Background:
-
I’m working with a BlueField-2 card that was previously used by someone else.
-
When I started, the host machine did not have DOCA installed.
-
The DPU itself had DOCA 1.1 installed.
-
My goal is to learn DOCA, so I decided to update the DOCA version on the DPU using
bfb-install
. -
My system infomation:
~$ lspci | grep "nox"
03:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
03:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
83:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
83:00.1 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface (rev 01)
~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
~$ uname -a
Linux ds07 4.15.0-112-generic #113~16.04.1-Ubuntu SMP Fri Jul 10 04:37:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- Before running
bfb-install
:- The
/dev/rshim0/misc
directory on the host did contain log files (although I don’t see a specific “DPU is ready” message). - The ConnectX-6 network interfaces associated with the BlueField-2 card were visible in the output of
ifconfig
on the host.
- The
Problem:
-
I ran the
bfb-install --rshim rshim0 --bfb DOCA_1.5.4_BSP_3.9.9_Ubuntu_20.04-14.24-06-LTS.prod.bfb
command to install a newer DOCA version onto the DPU. -
The process failed with the error message:
cat: write error: connection timeout
. -
After the failed `bfb-instal attempt:
- The
/dev/rshim0/misc
directory on the host is now completely empty. There are no log files inside. - The ConnectX-6 network interfaces associated with the BlueField-2 card are no longer visible in the output of
ifconfig
on the host.
- The
Troubleshooting Attempted:
I’ve found some troubleshooting guides related to the cat: write error: connection timeout
error during bfb-install
. However, most of these guides rely on checking the logs within /dev/rshim0/misc
to diagnose the communication issue with the DPU. Since this directory is now empty on my host, I’m stuck on how to proceed with the diagnosis.
Please forgive me if i missed any information, and i will add it immediately. Thanks for your time in advance!