Hello,
I am trying to setup connectivity over IPoIB between 2 KVM Guests.
On the host I have SR-IOV setup on ConnectX-3 VPI card with virtual function configured.
On the host
root@pve:~# lspci | grep Mell
0a:00.1 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:00.2 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:00.3 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:00.4 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:00.5 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:00.6 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:00.7 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.0 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.1 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.2 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.3 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.4 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.5 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.6 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:01.7 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
0a:02.0 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
I have configured 2 guest KVMs and both cards are visible in the guest KVMs :
Guest 1
ubuntu@k8s-sos-master:~$ lspci | grep Mellanox
00:10.0 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Guest 2
ubuntu@k8s-sos-node-1:~$ lspci | grep Mel
00:10.0 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual
Both VF are configured with IPoIB and have IP Addresses assigned to them :
Guest 1:
ubuntu@k8s-sos-master:~$ ifconfig ib1
ib1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044
inet 10.16.150.10 netmask 255.0.0.0 broadcast 10.255.255.255
inet6 fe80::e60a:2f02:7076:465 prefixlen 64 scopeid 0x20
unspec A0-00-0B-20-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 49 bytes 3036 (3.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Guest 2:ubuntu@k8s-sos-node-1:~$ ifconfig ib1
ib1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044
inet 10.16.150.21 netmask 255.0.0.0 broadcast 10.255.255.255
inet6 fe80::8e6c:b204:678d:ddef prefixlen 64 scopeid 0x20
unspec A0-00-0A-A0-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 1463 bytes 120820 (120.8 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1487 bytes 128208 (128.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
However, the I can’t ping between the 2 guests over these cards :
Guest 1
ubuntu@k8s-sos-master:~$ ping 10.16.150.21
PING 10.16.150.21 (10.16.150.21) 56(84) bytes of data.
From 10.16.150.10 icmp_seq=1 Destination Host Unreachable
From 10.16.150.10 icmp_seq=2 Destination Host Unreachable
From 10.16.150.10 icmp_seq=3 Destination Host Unreachable
Guest 2
ubuntu@k8s-sos-node-1:~$ ping 10.16.150.10
PING 10.16.150.10 (10.16.150.10) 56(84) bytes of data.
From 10.16.150.21 icmp_seq=1 Destination Host Unreachable
From 10.16.150.21 icmp_seq=2 Destination Host Unreachable
From 10.16.150.21 icmp_seq=3 Destination Host Unreachable
I can however ping another host that is connecting to the name IB network that is not configured as a VF :
Guest 1
ubuntu@k8s-sos-master:~$ ping 10.1.1.1
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=255 time=0.210 ms
64 bytes from 10.1.1.1: icmp_seq=2 ttl=255 time=0.219 ms
64 bytes from 10.1.1.1: icmp_seq=3 ttl=255 time=0.159 ms
64 bytes from 10.1.1.1: icmp_seq=4 ttl=255 time=0.174 ms
Guest 2
ubuntu@k8s-sos-node-1:~$ ping 10.1.1.1
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=255 time=0.179 ms
64 bytes from 10.1.1.1: icmp_seq=2 ttl=255 time=0.120 ms
64 bytes from 10.1.1.1: icmp_seq=3 ttl=255 time=0.157 ms
64 bytes from 10.1.1.1: icmp_seq=4 ttl=255 time=0.117 ms
I’ve configured that ufw is disabled on both guests.
Here’s the output of dmesg for mlx
[Tue Nov 12 13:08:55 2019] mlx_compat: loading out-of-tree module taints kernel.
[Tue Nov 12 13:08:55 2019] mlx_compat: module verification failed: signature and/or required key missing - tainting kernel
[Tue Nov 12 13:08:55 2019] mlx4_core: Mellanox ConnectX core driver v4.7-1.0.0
[Tue Nov 12 13:08:55 2019] mlx4_core: Initializing 0000:00:10.0
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: Detected virtual function - running in slave mode
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: Sending reset
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: Sending vhcr0
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: Requested number of MACs is too much for port 1, reducing to 64
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: Requested number of VLANs is too much for port 1, reducing to 1
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: HCA minimum page size:512
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: Timestamping is not supported in slave mode
[Tue Nov 12 13:08:55 2019] mlx4_core: device is working in RoCE mode: Roce V1
[Tue Nov 12 13:08:55 2019] mlx4_core: UD QP Gid type is: V1
[Tue Nov 12 13:08:55 2019] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.7-1.0.0
[Tue Nov 12 13:08:55 2019] <mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
[Tue Nov 12 13:08:55 2019] <mlx4_ib> mlx4_ib_add: counter index 22 for port 1 allocated 0
[Tue Nov 12 13:08:55 2019] <mlx4_ib> mlx4_ib_add: counter index 23 for port 2 allocated 0
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: mlx4_ib: multi-function enabled
[Tue Nov 12 13:08:55 2019] mlx4_core 0000:00:10.0: mlx4_ib: operating in qp1 tunnel mode
[Tue Nov 12 13:09:02 2019] card: mlx4_0, QP: 0xa80, inline size: 120
[Tue Nov 12 13:09:02 2019] card: mlx4_0, QP: 0xaa0, inline size: 120
[Tue Nov 12 13:09:03 2019] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.7-1.0.0
I’ve got MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu18.04 installed on both guest VMs.
On my host, I’ve got both ports of the ConnectX-3 configured for IB:
root@pve:~# cat /etc/modprobe.d/mlx4_core.conf
options mlx4_core port_type_array=1,1 num_vfs=16 probe_vf=0
Need some help from the exports on why I’m unable to ping between the 2 guest VMs? I can ping over regular gbe ethernet all day long without any issues :
Guest 1:
ubuntu@k8s-sos-master:~$ ping 172.16.150.21
PING 172.16.150.21 (172.16.150.21) 56(84) bytes of data.
64 bytes from 172.16.150.21: icmp_seq=1 ttl=64 time=0.772 ms
64 bytes from 172.16.150.21: icmp_seq=2 ttl=64 time=0.416 ms
64 bytes from 172.16.150.21: icmp_seq=3 ttl=64 time=0.577 ms
64 bytes from 172.16.150.21: icmp_seq=4 ttl=64 time=0.515 ms
64 bytes from 172.16.150.21: icmp_seq=5 ttl=64 time=0.491 ms
64 bytes from 172.16.150.21: icmp_seq=6 ttl=64 time=0.413 ms
Guest 2:
ubuntu@k8s-sos-node-1:~$ ping 172.16.150.10
PING 172.16.150.10 (172.16.150.10) 56(84) bytes of data.
64 bytes from 172.16.150.10: icmp_seq=1 ttl=64 time=0.712 ms
64 bytes from 172.16.150.10: icmp_seq=2 ttl=64 time=0.394 ms
64 bytes from 172.16.150.10: icmp_seq=3 ttl=64 time=0.489 ms
64 bytes from 172.16.150.10: icmp_seq=4 ttl=64 time=0.342 ms
64 bytes from 172.16.150.10: icmp_seq=5 ttl=64 time=0.452 ms
Would appreciate any help to debug this.