I installed the network drivers. Now what?

I am trying to setup Infiniband networking on our HGX GPU cluster, and have installed the doca-all package on two of the machines. I think the installation went fine, but I am struggling to figure out what to do next. How do I actually test whether Infiniband works?

For some additional info this is what I get when I do ifconfig on one of the machines.

ceti@ceti5:/opt/mellanox/doca$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:8b:e4:da:fa  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp83s0f0np0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether a0:88:c2:39:1c:90  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp83s0f1np1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether a0:88:c2:39:1c:91  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp86s0f0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 7c:c2:55:7b:43:74  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp86s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.69.75  netmask 255.255.255.0  broadcast 192.168.69.255
        inet6 fe80::7ec2:55ff:fe7b:4375  prefixlen 64  scopeid 0x20<link>
        ether 7c:c2:55:7b:43:75  txqueuelen 1000  (Ethernet)
        RX packets 97630  bytes 7988902 (7.9 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2528  bytes 269280 (269.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 1014  bytes 279266 (279.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1014  bytes 279266 (279.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

When I ask Perplexity it suggests I do a ping, but that’d just end up going over the Ethernet port. I have no idea what these other interfaces are for, nor how to test Infiniband.

ceti@ceti5:/opt/mellanox/doca$ lspci | grep -i infi
19:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
29:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
3b:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
5c:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
9b:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
aa:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
bb:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
da:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]

The Infiniband controllers are there.

ceti@ceti5:/opt/mellanox/doca$ mlxconfig q
-E- No devices found, mst might be stopped. You may need to run 'mst start' to load MST modules.
ceti@ceti5:/opt/mellanox/doca$ sudo mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4125_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:53:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4129_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:19:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4129_pciconf1         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:29:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4129_pciconf2         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:3b:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4129_pciconf3         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:5c:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4129_pciconf4         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:9b:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4129_pciconf5         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:aa:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4129_pciconf6         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:bb:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00
/dev/mst/mt4129_pciconf7         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:da:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 00

I already did sudo mst start and it didn’t help. Why is mlxconfig not finding any devices.

ceti@ceti5:/opt/mellanox/doca$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp86s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 7c:c2:55:7b:43:74 brd ff:ff:ff:ff:ff:ff
3: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fe:f8:a6:6d:9f:1e brd ff:ff:ff:ff:ff:ff
4: enp86s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 7c:c2:55:7b:43:75 brd ff:ff:ff:ff:ff:ff
    inet 192.168.69.75/24 metric 100 brd 192.168.69.255 scope global dynamic enp86s0f1
       valid_lft 50079sec preferred_lft 50079sec
    inet6 fe80::7ec2:55ff:fe7b:4375/64 scope link
       valid_lft forever preferred_lft forever
15: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:8b:e4:da:fa brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
16: enp83s0f0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:88:c2:39:1c:90 brd ff:ff:ff:ff:ff:ff
17: enp83s0f1np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:88:c2:39:1c:91 brd ff:ff:ff:ff:ff:ff
18: ibp25s0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 1000
    link/infiniband 00:00:0c:a1:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:ee:fe:84 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
19: ibp41s0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 1000
    link/infiniband 00:00:10:47:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:ed:42:d6 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
20: ibp59s0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 1000
    link/infiniband 00:00:10:47:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:ed:42:fe brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
21: ibp92s0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 1000
    link/infiniband 00:00:10:47:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:ee:f5:34 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
22: ibp155s0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 1000
    link/infiniband 00:00:10:47:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:ec:f6:3e brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
23: ibp170s0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 1000
    link/infiniband 00:00:10:47:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:ed:63:06 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
24: ibp187s0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 1000
    link/infiniband 00:00:10:47:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:ec:f7:7e brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
25: ibp218s0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default qlen 1000
    link/infiniband 00:00:10:47:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:ed:3d:d6 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

I think I need to configure all of the Infiniband so they have IP addresses and then bring them up, right? But wouldn’t it be awkward to have to assign an IP to each of these? Would it be possible to assign a single IP to all the interfaces?

Hi,

Let me try to answer all your questions here:

  1. after starting mst, please run: mst status -v, as this will show the devices in a more organized fashion. Then, to view the configuration, run mlxconfig -d /dev/mst/mt4129_pciconfX q, where ‘X’ is one of the devices (in your case, there are 8).
  2. The best way to test IB traffic is using perftest - ib_write_bw for example. The usage is simple - between 2 connected hosts, one should be the server and the second should be the client:
    Server: ib_write_bw -d mlx5_X --report_gbits --run_infinitely
    Client: ib_write_bw -d mlx5_X --report_gbits --run_infinitiely <ip address of server>
    Where ‘X’ is the mlx device (shown in mst status -v output), and at the end of the client command, enter the ip address of the server node.
  3. Setting up IP addresses for IB interfaces is also possible for IPoIB operations but not necessary to check basic traffic.

If there are additional issues or questions, please feel free to open a support case with enterprisesupport@nvidia.com, and it will be handled based on entitlement.

Thanks,
Jonathan.