Issue with Allocating InfiniBand Interface to Pod Using IPoIBNetwork CRD

Hi,

I’m encountering an issue when trying to allocate an InfiniBand interface to a Pod using the IPoIBNetwork CRD.


Host Environment
The server has two InfiniBand interfaces bonded in Ethernet mode, creating a bond0 interface.


YAML for Creating IPoIBNetwork CRD

apiVersion: mellanox.com/v1alpha1
kind: IPoIBNetwork
metadata:
  name: ipoib-bond0
spec:
  networkNamespace: jsh
  master: bond0
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "info",
      "range": "192.168.10.0/26"
    }

YAML for Creating Pod

apiVersion: v1
kind: Pod
metadata:
  name: test-pod-4
  namespace: jsh
  annotations:
    k8s.v1.cni.cncf.io/networks: |
      [
        { "name": "ipoib-bond0", "interface": "bond0" }
      ]
spec:
  containers:
  - name: app
    ...
    securityContext:
      capabilities:
        add: [ "IPC_LOCK" ]

Error Encountered During Pod Creation (verified using kubectl describe)
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox “1e962b13452bbf1d03502624baccfada68 7d00c06416eb79bae9f46a4948ad3b”: plugin type=“multus” name=“multus-cni-network” failed (add): [jsh/test-pod-4:ipoib-bond0]: error adding container to network “ipoib-bond0”: failed to create interface: no such device


Additional Information
InfiniBand interfaces not in Ethernet mode are allocated correctly using the same method.


Could anyone provide guidance on how to resolve this issue?
Any help would be greatly appreciated!
Thank you in advance.

Hi ,

Please note that this is DGX system .
Are you trying to create a bond for Ethernet or Infiniband ?
For IB use the specific interfaces (that are single port IB only not VPI)
For Ethernet you should use the dedicated Ethernet ports

https://docs.nvidia.com/dgx/dgxa100-user-guide/introduction-to-dgxa100.html#network-connections-cables-and-adaptors

4 port 0 (top) e1:00.0 enp225s0f0 (see note) mlx5_8 mlx5_10
4 port 1 (bottom) e1:00.1 enp225s0f1 (see note) mlx5_9 mlx5_11
5 port 0 (left) 61:00.0 enp97s0f0 (see note) mlx5_4
5 port 1 (right) 61:00.1 enp97s0f1 (see note) mlx5_5

The error you are getting means that you are trying to create ipoib (IB only interfaces) over Ethernet .

To create a bond you can follow the below :
o Create a bond in DGX-OS between two Connect-X6 cards you have to use netplan and Disable the Network manager service.

These are the steps:

  1. Stop and disable Network Manager

$ sudo systemctl stop NetworkManager.service

$ sudo systemctl disable NetworkManager.service

  1. Install the Bonding module on Ubuntu

$ sudo modprobe bonding

  1. Verify if it is enabled.

$ sudo lsmod | grep bonding

bonding 167936 0

  1. Otherwise, you will be required to install it:

$ sudo apt install ifenslave

  1. Load the modules in the kernel, so they can be activated automatically on boot.

$ echo 'bonding' | sudo tee -a /etc/modules
4. To configure a permanent Network Bonding on Ubuntu, Create a Netplan YAML file under /etc/netplan/ as below
5.Remember the YAML file has to be* formatted in a certain way.
You can use this site to validate that the format is ok and valid: http://www.yamllint.com/
Just copy/paste the code in to the window and press “go”.
It will tell you if the format is ok or not and what you need to fix External Media

Root @idgx :/etc/netplan# cat /etc/netplan/ 03 -bond.yaml

network:

version: 2

renderer: networkd

ethernets:

<Adapter1>:

dhcp4: no

<Adapter2>:

dhcp4: no

bonds:

bond0:

interfaces: [<Adapter1>, <Adapter1>]

addresses: [X.X.X.X/X]

gateway4: X.X.X.X

parameters:

mode: 802 .3ad

mii-monitor-interval: 100

nameservers:

addresses:

- "X.X.X.X" ### DNS Are Optional ###

### If you want to trunk VLANS in to the bond use this option ###

vlans:

bond0.X:

dhcp4: no

addresses: [X.X.X.X/X]

id: X

link: bond0

  1. Save the file and stop the two interfaces.

$ sudo ifconfig <Adapter1> down

$ sudo ifconfig <Adapter2> down

  1. Restart the network.

$ sudo netplan apply

  1. Now start the Network bond.

$ sudo ifconfig bond0 up

  1. View the detailed Network bond status

$ sudo cat /proc/net/bonding/bond0

Thanks
Samer