I have a question regarding compatibility issues between the BlueField-3 DPU and a switch.
I am trying to use a BlueField-3 DPU connected to an InfiniBand switch (model unknown). However, when connected to the switch, only one of the network interfaces either the DPU or the host becomes active.
When directly connected to another server, both the DPU and the host are in a Link Up and Active state and can use the network without issues. However, when connected through the switch, depending on the DPU’s mode, either only the DPU becomes Active or only the host becomes Active.
In this case, the other side shows a physical state of Link Up, but the state is Down.
I suspect this might be an OpenSM-related issue, but could this problem be caused by the switch itself? For example, could it be due to the switch being too old, or some system-level limitation where it does not support configurations like a 2-host server?
We encountered a similar issue on our machine when using DPU mode, and fixed it by running OpenSM on the DPU:
Launch OpenSM on DPU for using InfiniBand on host side. Before this step, running ibstat on host will show State: Down and Physical state: LinkUp. Running ibstat on host will show State: Up after this step.
# Get the `Node GUID` from the corresponding CA
ibstat
# Run OpenSM with the Node GUID to recognize virtual ports on the host.
sudo opensm -g <DPU_IB_NODE_GUID> -p 10
# If there's another OpenSM running on other hosts, make sure to set the priority higher than those.
# In our case, we have another OpenSM with priority 0 in the subnet, so we set our priority to 10.