VLAN setup on SR-IOV enabled Mellanox ConnectX-3 (CentOS7)

Hello,

I’m working on SR-IOV using Mellanox ConnectX-3 card (switch: Voltaire 4036) on CentOS 7.

Mellanox OFED Driver Installation and Configuration for SR-IOV https://community.mellanox.com/s/article/mellanox-ofed-driver-installation-and-configuration-for-sr-iov

Mellanox-Neutron-Icehouse-Redhat-Ethernet - OpenStack Mellanox-Neutron-Icehouse-Redhat-Ethernet - OpenStack

Nova-neutron-sriov - OpenStack Nova-neutron-sriov - OpenStack

Most description/packages are written/made based on CentOS 6.*/python 2.6.

I’m working on CentOS 7 so I’ve installed needed packages from git sources and tar balls.

I could verify that SR-IOV is installed using lspci command.

lspci -nn | grep Mell

21:00.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]

21:00.1 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:00.2 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:00.3 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:00.4 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:00.5 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:00.6 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:00.7 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:01.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:01.1 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:01.2 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:01.3 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:01.4 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:01.5 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:01.6 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:01.7 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

21:02.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

I could verify that this VF can be attached to VM (SR-IOV, hostdev) using virsh.

But, “VLAN operation failed” while launching OpenStack VM, according to the following mlnx-agent and eswitchd logs.

2015-02-24 16:51:31,346 DEBUG eswitchd [-] Handling message - {u’action’: u’set_vlan’, u’vlan’: 1000, u’fabric’: u’physnet1’, u’port_mac’: u’fa:16:3e:cc:76:bd’}

2015-02-24 16:51:31,346 DEBUG eswitchd [-] Running command: sudo eswitch-rootwrap /etc/eswitchd/rootwrap.conf ip link set ens4 vf 9 vlan 1000 qos 0

2015-02-24 16:51:31,441 DEBUG eswitchd [-]

Command: [‘sudo’, ‘eswitch-rootwrap’, ‘/etc/eswitchd/rootwrap.conf’, ‘ip’, ‘link’, ‘set’, ‘ens4’, ‘vf’, ‘9’, ‘vlan’, ‘1000’, ‘qos’, ‘0’]

Exit code: 2

Stdout: ‘’

Stderr: ‘RTNETLINK answers: Operation not supported\n’

2015-02-24 16:51:31,442 ERROR eswitchd [-] Set VLAN operation failed

Also, I tried manually, but it’s same as follows. Assigning MAC is fine but VLAN setup is failed. The iproute2-3.19.0 is installed.

ip link set ens4 vf 9 mac fa:16:3e:cc:76:bd

ip link show ens4

11: ens4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master ovs-system state DOWN mode DEFAULT qlen 1000

link/ether 00:02:c9:fb:a4:50 brd ff:ff:ff:ff:ff:ff

vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 8 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 9 MAC fa:16:3e:cc:76:bd, vlan 4095, spoof checking off, link-state auto

vf 10 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 11 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 12 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 13 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 14 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

vf 15 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

ip link set ens4 vf 9 vlan 1000

RTNETLINK answers: Operation not supported

cat /boot/config-3.10.0-123.20.1.el7.x86_64 | grep NETFILTER_NETLINK

CONFIG_NETFILTER_NETLINK=m

CONFIG_NETFILTER_NETLINK_ACCT=m

CONFIG_NETFILTER_NETLINK_QUEUE=m

CONFIG_NETFILTER_NETLINK_LOG=m

CONFIG_NETFILTER_NETLINK_QUEUE_CT=y

Any suggestions are welcome.

After changing switch to SwitchX-2 SX6036, Ethernet link/port is UP. Resolved.

Hi Erez,

Thanks for the suggestion. I’m following HowTo Change Port Type in Mellanox ConnectX-3 Adapter https://community.mellanox.com/s/article/howto-change-port-type-in-mellanox-connectx-3-adapter now.

Actually, /sys/bus/pci/devices/0000:21:00.0/mlx4_port1 or port2 are already set as eth.

Once port_type_array is set, I can’t change the port configuration as follows:

cat /etc/modprobe.d/mlx4_core.conf

options mlx4_core port_type_array=2,2 num_vfs=16 probe_vf=0 enable_64b_cqe_eqe=0 log_num_mgm_entry_size=-1

connectx_port_config

ConnectX PCI devices :

|----------------------------|

| 1 0000:21:00.0 |

|----------------------------|

Before port change:

eth

eth

Not allowed to change port configuration, quitting…

When trying it after commenting current setup in mlx4_core.conf, still Ethernet is DOWN as follows:

cat /sys/bus/pci/devices/0000:21:00.0/mlx4_port1

eth

cat /sys/bus/pci/devices/0000:21:00.0/mlx4_port2

eth

connectx_port_config -s


Port configuration for PCI device: 0000:21:00.0 is:

eth

eth


connectx_port_config

ConnectX PCI devices :

|----------------------------|

| 1 0000:21:00.0 |

|----------------------------|

Before port change:

eth

eth

|----------------------------|

| Possible port modes: |

| 1: Infiniband |

| 2: Ethernet |

| 3: AutoSense |

|----------------------------|

Select mode for port 1 (1,2,3): 1

Select mode for port 2 (1,2,3): 1

WARNING: Illegal port configuration attempted,

Please view dmesg for details.

// … [ 4135.654328] Request for unknown module key ‘Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403’ err -11 …

connectx_port_config

ConnectX PCI devices :

|----------------------------|

| 1 0000:21:00.0 |

|----------------------------|

Before port change:

eth

eth

|----------------------------|

| Possible port modes: |

| 1: Infiniband |

| 2: Ethernet |

| 3: AutoSense |

|----------------------------|

Select mode for port 1 (1,2,3): 2

Select mode for port 2 (1,2,3): 2

After port change:

eth

eth

hca_self_test.ofed

---- Performing Adapter Device Self Test ----

Number of CAs Detected … 1

PCI Device Check … PASS

Kernel Arch … x86_64

Host Driver Version … MLNX_OFED_LINUX-2.4-1.0.0 (OFED-2.4-1.0.0): modules

Host Driver RPM Check … PASS

Firmware on CA #0 VPI … v2.33.5000

Firmware Check on CA #0 (VPI) … PASS

Host Driver Initialization … PASS

Number of CA Ports Active … 0

Port State of Port #1 on CA #0 (VPI)… DOWN (Ethernet)

Port State of Port #2 on CA #0 (VPI)… DOWN (Ethernet)

Error Counter Check on CA #0 (VPI)… NA (Eth ports)

Kernel Syslog Check … PASS

Node GUID on CA #0 (VPI) … 00:02:c9:03:00:fb:a4:50

------------------ DONE ---------------------

When upgrading ConnectX-3 firmware, it was not installed automatically using “mlnxofedinstall” on CentOS7.

So I upgraded the firmware from 2.11 to 2.33 using “firmware/mlxfwmanager_sriov_en_x86_64 --online -u -d 21:00.0” command.

Is there any other stuff to be checked? Any tips are welcome.

The script simply tries to query the VFs you’ve created for firmware version. Don’t think there’s anything wrong here. you’ll see above that the real HCA is identified with 2.33.5000

It looks like before the link type was Infiniband and the ports were on Init stage, and right now they’re on ethernet and the port type is down. You can change the ports to work as IB with the connectx_port_config command.

This “ip link sest” problem is resolved by upgrading ConnectX-3 firmware to 2.33.5000.

But, hca_self_test.ofed shows FAIL/DOWN results. Any suggestion to resolve this?

hca_self_test.ofed

---- Performing Adapter Device Self Test ----

Number of CAs Detected … 17

PCI Device Check … PASS

Kernel Arch … x86_64

Host Driver Version … MLNX_OFED_LINUX-2.4-1.0.0 (OFED-2.4-1.0.0): modules

Host Driver RPM Check … PASS

Firmware on CA #0 VPI … v2.33.5000

Firmware Check on CA #0 (VPI) … PASS

Firmware Check on CA #1 (VPI) … FAIL

REASON: CA #1: failed to get firmware version

Firmware Check on CA #2 (VPI) … FAIL

REASON: CA #2: failed to get firmware version

Firmware Check on CA #3 (VPI) … FAIL

REASON: CA #3: failed to get firmware version

Firmware Check on CA #4 (VPI) … FAIL

REASON: CA #4: failed to get firmware version

Firmware Check on CA #5 (VPI) … FAIL

REASON: CA #5: failed to get firmware version

Firmware Check on CA #6 (VPI) … FAIL

REASON: CA #6: failed to get firmware version

Firmware Check on CA #7 (VPI) … FAIL

REASON: CA #7: failed to get firmware version

Firmware Check on CA #8 (VPI) … FAIL

REASON: CA #8: failed to get firmware version

Firmware Check on CA #9 (VPI) … FAIL

REASON: CA #9: failed to get firmware version

Firmware Check on CA #10 (VPI) … FAIL

REASON: CA #10: failed to get firmware version

Firmware Check on CA #11 (VPI) … FAIL

REASON: CA #11: failed to get firmware version

Firmware Check on CA #12 (VPI) … FAIL

REASON: CA #12: failed to get firmware version

Firmware Check on CA #13 (VPI) … FAIL

REASON: CA #13: failed to get firmware version

Firmware Check on CA #14 (VPI) … FAIL

REASON: CA #14: failed to get firmware version

Firmware Check on CA #15 (VPI) … FAIL

REASON: CA #15: failed to get firmware version

Firmware Check on CA #16 (VPI) … FAIL

REASON: CA #16: failed to get firmware version

Host Driver Initialization … PASS

Number of CA Ports Active … 0

Port State of Port #1 on CA #0 (VPI)… DOWN (Ethernet)

Port State of Port #2 on CA #0 (VPI)… DOWN (Ethernet)

Error Counter Check on CA #0 (VPI)… NA (Eth ports)

Kernel Syslog Check … PASS

Node GUID on CA #0 (VPI) … 00:02:c9:03:00:fb:a4:50

Node GUID on CA #1 (VPI) … NA

Node GUID on CA #2 (VPI) … NA

Node GUID on CA #3 (VPI) … NA

Node GUID on CA #4 (VPI) … NA

Node GUID on CA #5 (VPI) … NA

Node GUID on CA #6 (VPI) … NA

Node GUID on CA #7 (VPI) … NA

Node GUID on CA #8 (VPI) … NA

Node GUID on CA #9 (VPI) … NA

Node GUID on CA #10 (VPI) … NA

Node GUID on CA #11 (VPI) … NA

Node GUID on CA #12 (VPI) … NA

Node GUID on CA #13 (VPI) … NA

Node GUID on CA #14 (VPI) … NA

Node GUID on CA #15 (VPI) … NA

Node GUID on CA #16 (VPI) … NA

------------------ DONE ---------------------

Thanks, Erez! OK. If the failed VF firmware checking is fine… how about “Port State of Port #x on CA #0 (VPI)”? Before setting SR-IOV, that was not “DOWN”.

Hi,

First in order to check SRIOV functionality the physical port does not have to be UP. but with regards to that, what is this host connected to ? what switch ?

Hi, Those ConnectX-3 are connected to Voltaire 4036 Switch.