OVS not up in Embedded Mode of Bluefield-2

Hi all, I recently tried to switch Bluefield-2 Infiniband DPU from Separated Host Mode to ECPF Mode and had difficulties with OVS startup. My steps are as follows.

  1. I changed INTERNAL_CPU_MODEL to 1 according to DPU OS 3.9.0(Modes of Operation - BlueField DPU OS 3.9.0 - NVIDIA Networking Docs), and checked /etc/ mellanox/mlnx-ovs.conf configuration:

    CREATE_OVS_BRIDGES="yes"
    OVS_BRIDGE1="ovsbr1"
    OVS_BRIDGE1_PORTS="p0 pf0hpf en3f0pf0sf0"
    OVS_BRIDGE2="ovsbr2"
    OVS_BRIDGE2_PORTS="p1 pf1hpf en3f1pf1sf0"
    OVS_HW_OFFLOAD="yes"
    OVS_START_TIMEOUT=30
    

    Everything looked fine, then I power cycled the server. Afterwards, I went up to the DPU and checked with sudo ovs-vsctl show and the result was as follows.

    0ffc4fa4-fb7a-4e27-afef-4b6d80cd808f
        ovs_version: "2.15.1-d246dab"
    

    It was empty. Unsuccessful bridging.

  2. Then I tried to reinstall the DPU OS with BFB. My package is DOCA_1.3.0_BSP_3.9.0_Ubuntu_20.04-6.signed. To coordinate with the version on the DPU, I also reinstalled the ofed (5.6.1.0.3) and Bluefield driver for the Host environment. I check /etc/mellanox/mlnx-ovs.conf before power cycling the server with the same result as above and subsequently power cycled the machine.
    Nothing changed with ovs, still unsuccessful bridging.

  3. I went through the contents of BlueField DPU OS 3.9.0-Deploying DPU OS Using BFB from Host-Default Ports and OVS Configuration and checked the contents of /etc/modprobe.d/mlnx-bf.conf:
    install ib_umad /sbin/modprobe --ignore-install ib_umad $CMDLINE_OPTS && (if [ -x /sbin/mlnx_bf_configure ]; then /sbin/mlnx_bf_configure; fi)

    This seems inconsistent with the description of The /sbin/mlnx_bf_configure script runs automatically with mlx5_ib kernel module loaded in the documentation, and I’m not sure if this is the cause of the ovs failure.

    I also tried running /sbin/mlnx_bf_configure directly and nothing happens. mlnx-sf -a show print nothing but an empty line, ovs-vsctl show print results that was the same as before.

  4. In addition to the above, I checked the en3f1pf1sf0 port with the command ifconfig en3f1pf1sf0 and found the error:
    en3f1pf1sf0: error fetching interface information: Device not found

Ok, thank you very much for reading this, this is all I have tried to do for this problem, and now there is nothing I can do. Can somebody give me a hand with this? I would like to offer my sincere thanks.

P.S. Our device works fine in Separated Host Mode, so I don’t think it’s a connection or hardware failure, but I welcome your criticism to point out my potential mistakes.