Mlnx_bf_configure failures

I’ve got a couple of BlueField-2 devices, one of them (seems to be originating from DELL) unfortunately fails to configure properly on start. It looks like it has issues setting up one of the interfaces, the last one:

ubuntu@localhost:~$ lspci -Dnn
0000:00:00.0 PCI bridge [0604]: Mellanox Technologies MT42822 BlueField-2 SoC Crypto enabled [15b3:a2d4] (rev 01)
0000:01:00.0 PCI bridge [0604]: Mellanox Technologies MT42822 Family [BlueField-2 SoC PCIe Bridge] [15b3:1978] (rev 01)
0000:02:00.0 PCI bridge [0604]: Mellanox Technologies MT42822 Family [BlueField-2 SoC PCIe Bridge] [15b3:1978] (rev 01)
0000:03:00.0 Ethernet controller [0200]: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller [15b3:a2d6] (rev 01)
0000:03:00.1 Ethernet controller [0200]: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller [15b3:a2d6] (rev 01)
0000:03:00.2 Ethernet controller [0200]: Mellanox Technologies BlueField DPU Family Auxiliary Communication Channel [BlueField Family] [15b3:c2d1] (rev 01)
0000:03:00.3 Ethernet controller [0200]: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller [15b3:a2d6] (rev 01)

That last one which also has a 15b3:a2d6 - I’m not sure what is it, my other device (AENO version) doesn’t have it. mlnx_bf_configure tries to switch it from “legacy” to switchdev mode which fails:

ubuntu@localhost:~$ sudo /sbin/mlnx_bf_configure
Switch mode for 0000:03:00.3 is legacy, setting it to switchdev
/sbin/mlnx_bf_configure: line 146: echo: write error: Invalid argument
/sbin/mlnx_bf_configure: line 146: echo: write error: Invalid argument
/sbin/mlnx_bf_configure: line 146: echo: write error: Invalid argument
/sbin/mlnx_bf_configure: line 146: echo: write error: Invalid argument

this times out after some time.

I’ve tried to push different versions, bfb 2.9.1, 2.8.0, 2.5.3, but still have issues
Any ideas?

In the same time, in boot logs I see this:

Jan 21 21:37:26 localhost mlnx_interface_mgr[2578]: Setting up Mellanox network interface: pf1hpf
Jan 21 21:37:26 localhost mlnx_interface_mgr[2582]: Got ETH interface pf1hpf and OS is booting, skipping.
Jan 21 21:37:26 localhost systemd[1]: mlnx_interface_mgr@pf1hpf.service: Deactivated successfully.
Jan 21 21:37:27 localhost systemd-networkd[1975]: pf1hpf: Gained carrier
Jan 21 21:37:27 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): pf1hpf: link becomes ready
Jan 21 21:37:27 localhost mlnx_bf_configure[2584]: INFO: Configured switchdev mode for 0000:03:00.1 on try: 1
Jan 21 21:37:27 localhost mlnx_bf_configure[2588]: INFO: Set ct_max_offloaded_conns parameter to 1000000 value for 0000:03:00.1
Jan 21 21:37:28 localhost mlnx_bf_configure[2620]: INFO: Device 0000:03:00.3 is in SmartNIC mode
Jan 21 21:37:28 localhost mlnx_bf_configure[2631]: INFO: Shared RQ: Set esw_pet_insert parameter to true value for 0000:03:00.3
Jan 21 21:37:28 localhost mlnx_bf_configure[2634]: INFO: Stopped strongswan.service successfully
Jan 21 21:37:28 localhost kernel: mlx5_core 0000:03:00.3: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Jan 21 21:37:29 localhost systemd-networkd[1975]: pf1hpf: Gained IPv6LL
Jan 21 21:37:29 localhost kernel: mlx5_core 0000:03:00.3: mlx5_cmd_out_err:832:(pid 1255): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x75d551), err(-22)
Jan 21 21:37:29 localhost kernel: mlx5_core 0000:03:00.3: E-Switch: vport[65535] create ingress ACL table, err(-22)
Jan 21 21:37:29 localhost kernel: mlx5_core 0000:03:00.3: esw_compat_write:359:(pid 1255): mlx5_core: Failed setting eswitch to offloads
Jan 21 21:37:29 localhost systemd-udevd[670]: uverbs2: /usr/lib/udev/rules.d/90-ib.rules:4 Only network interfaces can be renamed, ignoring NAME=“infiniband/%k”.


Jan 21 21:39:52 localhost mlnx_bf_configure[3306]: ERR: Failed to configure switchdev mode for 0000:03:00.3 after 61 retries
Jan 21 21:39:52 localhost mlnx_bf_configure[3312]: INFO: Set ct_max_offloaded_conns parameter to 1000000 value for 0000:03:00.3
Jan 21 21:39:52 localhost mlnx_bf_configure[3318]: ERR: Exiting due to failures. RC=1
Jan 21 21:39:52 localhost openibd[932]: [50B blob data]
Jan 21 21:39:52 localhost root[3319]: openibd: ERROR: Failed loading kernel module ib_umad.
Jan 21 21:39:52 localhost openibd[932]: [49B blob data]
Jan 21 21:39:52 localhost openibd[932]: Please run /usr/sbin/sysinfo-snapshot.py to collect the debug information
Jan 21 21:39:52 localhost openibd[932]: and open an issue in the http://support.mellanox.com/SupportWeb/service_center/SelfService
Jan 21 21:39:52 localhost systemd[1]: openibd.service: Main process exited, code=exited, status=1/FAILURE
Jan 21 21:39:52 localhost systemd[1]: openibd.service: Failed with result ‘exit-code’.
Jan 21 21:39:52 localhost systemd[1]: Failed to start openibd - configure Mellanox devices.

Info:
ubuntu@localhost:~$ sudo flint -d /dev/mst/mt41686_pciconf0 q full
Image type: FS4
FW Version: 24.42.1000
FW Release Date: 8.8.2024
Part Number: 0JNDCM_Dx
Description: NVIDIA Bluefield-2 Dual Port 25 GbE SFP Crypto DPU
Product Version: 24.42.1000
Rom Info: type=UEFI Virtio net version=21.4.13 cpu=AMD64,AARCH64
type=UEFI Virtio blk version=22.4.13 cpu=AMD64,AARCH64
type=UEFI version=14.35.15 cpu=AMD64,AARCH64
type=PXE version=3.7.500 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 946dae03002f27b6 16
Base MAC: 946dae2f27b6 16
Image VSD: N/A
Device VSD: N/A
PSID: DEL0000000033
Security Attributes: secure-fw
Default Update Method: fw_ctrl
Life cycle: GA SECURED
Secure Boot Capable: Enabled
EFUSE Security Ver: 0
Image Security Ver: 0
Security Ver Program: Manually ; Disabled

Ok so for whatever reason default NUM_OF_PF seems to be 3, even though it’s 2 port
Reconfiguring it to 2 solves this

sudo mlxconfig -d 0000:03:00.1 s PER_PF_NUM_SF=1 PF_TOTAL_SF=252 PF_SF_BAR_SIZE=12 NUM_OF_PF=2

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.