I’ve got a couple of BlueField-2 devices, one of them (seems to be originating from DELL) unfortunately fails to configure properly on start. It looks like it has issues setting up one of the interfaces, the last one:
That last one which also has a 15b3:a2d6 - I’m not sure what is it, my other device (AENO version) doesn’t have it. mlnx_bf_configure tries to switch it from “legacy” to switchdev mode which fails:
ubuntu@localhost:~$ sudo /sbin/mlnx_bf_configure
Switch mode for 0000:03:00.3 is legacy, setting it to switchdev
/sbin/mlnx_bf_configure: line 146: echo: write error: Invalid argument
/sbin/mlnx_bf_configure: line 146: echo: write error: Invalid argument
/sbin/mlnx_bf_configure: line 146: echo: write error: Invalid argument
/sbin/mlnx_bf_configure: line 146: echo: write error: Invalid argument
…
this times out after some time.
I’ve tried to push different versions, bfb 2.9.1, 2.8.0, 2.5.3, but still have issues
Any ideas?
Jan 21 21:37:26 localhost mlnx_interface_mgr[2578]: Setting up Mellanox network interface: pf1hpf
Jan 21 21:37:26 localhost mlnx_interface_mgr[2582]: Got ETH interface pf1hpf and OS is booting, skipping.
Jan 21 21:37:26 localhost systemd[1]: mlnx_interface_mgr@pf1hpf.service: Deactivated successfully.
Jan 21 21:37:27 localhost systemd-networkd[1975]: pf1hpf: Gained carrier
Jan 21 21:37:27 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): pf1hpf: link becomes ready
Jan 21 21:37:27 localhost mlnx_bf_configure[2584]: INFO: Configured switchdev mode for 0000:03:00.1 on try: 1
Jan 21 21:37:27 localhost mlnx_bf_configure[2588]: INFO: Set ct_max_offloaded_conns parameter to 1000000 value for 0000:03:00.1
Jan 21 21:37:28 localhost mlnx_bf_configure[2620]: INFO: Device 0000:03:00.3 is in SmartNIC mode
Jan 21 21:37:28 localhost mlnx_bf_configure[2631]: INFO: Shared RQ: Set esw_pet_insert parameter to true value for 0000:03:00.3
Jan 21 21:37:28 localhost mlnx_bf_configure[2634]: INFO: Stopped strongswan.service successfully
Jan 21 21:37:28 localhost kernel: mlx5_core 0000:03:00.3: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Jan 21 21:37:29 localhost systemd-networkd[1975]: pf1hpf: Gained IPv6LL Jan 21 21:37:29 localhost kernel: mlx5_core 0000:03:00.3: mlx5_cmd_out_err:832:(pid 1255): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x75d551), err(-22) Jan 21 21:37:29 localhost kernel: mlx5_core 0000:03:00.3: E-Switch: vport[65535] create ingress ACL table, err(-22) Jan 21 21:37:29 localhost kernel: mlx5_core 0000:03:00.3: esw_compat_write:359:(pid 1255): mlx5_core: Failed setting eswitch to offloads
Jan 21 21:37:29 localhost systemd-udevd[670]: uverbs2: /usr/lib/udev/rules.d/90-ib.rules:4 Only network interfaces can be renamed, ignoring NAME=“infiniband/%k”.
… Jan 21 21:39:52 localhost mlnx_bf_configure[3306]: ERR: Failed to configure switchdev mode for 0000:03:00.3 after 61 retries
Jan 21 21:39:52 localhost mlnx_bf_configure[3312]: INFO: Set ct_max_offloaded_conns parameter to 1000000 value for 0000:03:00.3
Jan 21 21:39:52 localhost mlnx_bf_configure[3318]: ERR: Exiting due to failures. RC=1
Jan 21 21:39:52 localhost openibd[932]: [50B blob data]
Jan 21 21:39:52 localhost root[3319]: openibd: ERROR: Failed loading kernel module ib_umad.
Jan 21 21:39:52 localhost openibd[932]: [49B blob data]
Jan 21 21:39:52 localhost openibd[932]: Please run /usr/sbin/sysinfo-snapshot.py to collect the debug information
Jan 21 21:39:52 localhost openibd[932]: and open an issue in the http://support.mellanox.com/SupportWeb/service_center/SelfService
Jan 21 21:39:52 localhost systemd[1]: openibd.service: Main process exited, code=exited, status=1/FAILURE
Jan 21 21:39:52 localhost systemd[1]: openibd.service: Failed with result ‘exit-code’.
Jan 21 21:39:52 localhost systemd[1]: Failed to start openibd - configure Mellanox devices.