BF2 DOCA Upgrade via bfb-install Fails: "connection timeout", /dev/rshim0/misc empty, CX6 interfaces disappear

,

Hi everyone,

I’m encountering an issue while trying to install/upgrade DOCA on a BlueField-2 DPU using the bfb-install tool, and I’m hoping someone might have some insights.

Background:

  1. I’m working with a BlueField-2 card that was previously used by someone else.

  2. When I started, the host machine did not have DOCA installed.

  3. The DPU itself had DOCA 1.1 installed.

  4. My goal is to learn DOCA, so I decided to update the DOCA version on the DPU using bfb-install.

  5. My system infomation:

~$ lspci | grep "nox"
03:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
03:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
83:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller (rev 01)
83:00.1 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface (rev 01)

~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.3 LTS
Release:	16.04
Codename:	xenial

~$ uname -a
Linux ds07 4.15.0-112-generic #113~16.04.1-Ubuntu SMP Fri Jul 10 04:37:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  1. Before running bfb-install:
    • The /dev/rshim0/misc directory on the host did contain log files (although I don’t see a specific “DPU is ready” message).
    • The ConnectX-6 network interfaces associated with the BlueField-2 card were visible in the output of ifconfig on the host.

Problem:

  1. I ran the bfb-install --rshim rshim0 --bfb DOCA_1.5.4_BSP_3.9.9_Ubuntu_20.04-14.24-06-LTS.prod.bfb command to install a newer DOCA version onto the DPU.

  2. The process failed with the error message: cat: write error: connection timeout.

  3. After the failed `bfb-instal attempt:

    • The /dev/rshim0/misc directory on the host is now completely empty. There are no log files inside.
    • The ConnectX-6 network interfaces associated with the BlueField-2 card are no longer visible in the output of ifconfig on the host.

Troubleshooting Attempted:

I’ve found some troubleshooting guides related to the cat: write error: connection timeout error during bfb-install. However, most of these guides rely on checking the logs within /dev/rshim0/misc to diagnose the communication issue with the DPU. Since this directory is now empty on my host, I’m stuck on how to proceed with the diagnosis.

Please forgive me if i missed any information, and i will add it immediately. Thanks for your time in advance!

Hello,

Please try running the following command from the host where the BF2 device is installed:
echo ‘SW_RESET 1’ > /dev/rshim0/misc
This will send a reset command to the rshim interface.
If the DPU boots, I suggest running the bfb-install command again, while making sure to follow the instructions here:

If the issue still persists, please open a case with enterprisesupport@nvidia.com, and it will be handled based on entitlement.

Thanks,
Jonathan.