ConnectX5 ASAP2 OVS VXLAN offload + bond not working properly

Hello there,

Maybe someone can help me.

Last time I was here with the same problem, I’ve been told to try with the latest openvswitch build so I did:

ovs-vsctl (Open vSwitch) 2.11.90

It improved things a little, but still not perfect.

Setup:

Dual port ConnectX5 (MT27800) latest firmware.

Ubuntu 18.04 Linux kernel: 4.18.0-25-generic

(No ofed drivers as the ofed driver fails to enable eswitch with this: (0000:3b:00.0): E-Switch: Failed setting eswitch to offloads)

LACP active on both of the ports (port0 and port1) on the connectx5, systemd activates the lacp and the SRIOV sub interfaces before networking starts. OVS offloading in ovs enabled.

SRIOV subinterfaces on port0 created (boot time), port1 doesn’t have any SRIOV subinterfaces created.

When the vm’s traffic (using asap2) goes through the port0 (left port on the cart), all works fine, everything gets offloaded:

listening on eth3, link-type EN10MB (Ethernet), capture size 262144 bytes

15:32:36.042217 IP 10.100.140.15 > 10.100.140.3: ICMP echo request, id 1357, seq 1, length 64

15:32:36.042451 IP 10.100.140.3 > 10.100.140.15: ICMP echo reply, id 1357, seq 1, length 64

15:32:41.166466 ARP, Request who-has 10.100.140.3 tell 10.100.140.15, length 46

15:32:41.166588 ARP, Reply 10.100.140.3 is-at fa:16:3e:04:b7:60, length 46

As we see only 2 packet hits the virtual port as seen in the docs.

When I down port0 and the flow moves to port1 (right port on the card), this happens:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth3, link-type EN10MB (Ethernet), capture size 262144 bytes

15:38:47.401961 IP 10.100.140.15 > 10.100.140.3: ICMP echo request, id 1362, seq 40, length 64

15:38:48.425970 IP 10.100.140.15 > 10.100.140.3: ICMP echo request, id 1362, seq 41, length 64

15:38:49.449992 IP 10.100.140.15 > 10.100.140.3: ICMP echo request, id 1362, seq 42, length 64

In this case offload happens in only one direction which is not ideal.

When I re-enable the port0 the traffic will still flow on the port1 and still gonna be half offloaded. If I disable the port1 once again, the traffic finally moves back to port0 and gets fully offloaded.

Is this a bug? A feature? What am I doing wrong? :(

Any help would be appreciated.

Thank you very much!

Zoltan

Hi Zoltan,

In order to use ASAP2 complete solution you must install Mellanox OFED driver (v4.4 and above), as well as iproute2 and openvswitch packages.

Bonding (SR-IOV VF LAG) is supported with the following combination:

OVS: v2.11.90 (and above)

Driver: MLNX_OFED 4.6

Ubuntu: 18.04

Kernel: 4.15

We need to understand why MLNX_OFED driver fails to enable eswitch.

Please go over the following articles to make sure all steps were taken in order to properly configure SR-IOV & ASAP2.

SR-IOV: https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm–ethernet-x

ASAP2: http://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.6.pdf

Please make sure you change the e-switch mode from legacy to switchdev on the PF device. Example:

echo switchdev > /sys/class/net/enp4s0f0/compat/devlink/mode

Best Regards,

Chen

Hi Chen,

Thanks for reply. I’m Zoltan colleague, we tested with ofed and default kernel drivers. In case of ofed driver 4.6 version is used, but when we try to switch nic to “switchdev” mode its failed with the following error.

root@compute-10:/home/ubuntu# devlink dev eswitch set pci/0000:3b:00.0 mode switchdev

devlink answers: Invalid argument

155.794991] mlx5_core 0000:3b:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 156.184276] (0000:3b:00.0): E-Switch: Failed setting eswitch to offloads

[ 156.184278] (0000:3b:00.0): E-Switch: E-Switch enable SRIOV: nvfs(4) mode (1)

iprouter version:

ii iproute2 4.15.0-2ubuntu1 amd64 networking and traffic control tools

openvswitch:

ruslanloman/openvswitch v2.11.90 f858879fc864 2 months ago 616MB

Thanks!

I just noticed your suggestion by expanding post. We’ll double check our configuration and back to you.

Thanks!

This command: echo switchdev > /sys/class/net/enp59s0f0/compat/devlink/mode

Fails with this:

[ 1731.458706] (0000:3b:00.0): E-Switch: disable SRIOV: active vports(5) mode(1)

[ 1731.486864] (0000:3b:00.0): E-Switch: E-Switch destroying group TSAR but group not empty (group:0)

[ 1731.492961] (0000:3b:00.0): E-Switch: E-Switch enable SRIOV: nvfs(4) mode (2)

[ 1731.769567] bond0: Releasing backup interface enp59s0f0

[ 1732.767455] mlx5_core 0000:3b:00.0: mlx5_cmd_check:775:(pid 759): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x31ed04)

[ 1732.774734] mlx5_core 0000:3b:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 1732.856055] mlx5_core 0000:3b:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 1732.894535] mlx5_core 0000:3b:00.0 enp59s0f0: renamed from eth0

[ 1732.918184] mlx5_core 0000:3b:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 1733.081470] mlx5_core 0000:3b:00.0 enp59s0f0: Failed to init debugfs files for enp59s0f0

[ 1733.082916] mlx5_core 0000:3b:00.0 enp59s0f0: Link up

[ 1733.089188] bond0: Enslaving enp59s0f0 as a backup interface with an up link

[ 1733.097292] mlx5_core 0000:3b:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 1733.157105] mlx5_core 0000:3b:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 1733.252980] mlx5_core 0000:3b:00.0: mlx5_cmd_check:775:(pid 759): ALLOC_Q_COUNTER(0x771) op_mod(0x0) failed, status limits exceeded(0x8), syndrome (0x587239)

[ 1733.258902] infiniband (null): mlx5_ib_alloc_counters:5525:(pid 759): couldn’t allocate queue counter for port 128, err -12

[ 1733.893295] bond0: Releasing backup interface enp59s0f0

[ 1734.140687] mlx5_core 0000:3b:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 1734.589849] (0000:3b:00.0): E-Switch: Failed setting eswitch to offloads

[ 1734.589853] (0000:3b:00.0): E-Switch: E-Switch enable SRIOV: nvfs(4) mode (1)

[ 1734.597871] mlx5_core 0000:3b:00.0 enp59s0f0: renamed from eth0

[ 1734.629834] (0000:3b:00.0): E-Switch: SRIOV enabled: active vports(5)

[ 1735.525786] mlx5_core 0000:3b:00.0 enp59s0f0: Failed to init debugfs files for enp59s0f0

[ 1735.529158] mlx5_core 0000:3b:00.0 enp59s0f0: Link up

[ 1735.536264] 8021q: adding VLAN 0 to HW filter on device enp59s0f0

[ 1735.554672] bond0: Enslaving enp59s0f0 as a backup interface with an up link

Is it a bug? Or what am I missing?

We are experiencing a strange problem related to VF LAG. The outbound traffic of VFs goes through a single PF while the other PF in the same bond is not employed.

The system and OFED version we are using are as follows:

System: CentOS Linux release 7.4.1708 (Core) with kernel 4.18.0-80

OFED: 4.6-1.0.1

We have debugged a lot and the problem is not resolved.

According to https://community.mellanox.com/s/article/Debugging-VF-LAG-issues-with-ASAP2, the tx queues of a VF are evenly distributed among PFs.

We have tested with instructions in the page and find that the outbound traffic is distributed among different tx queues of a VF, however, the actual outbound traffic is restricted to a single PF.

We have disabled xps for all network interfaces.

TX bytes of different queues of the tested VF are:

TX bytes of the two PFs are as below (eth0 is in the left part and eth1 is in the right part)

Is there any way to debug the mapping between tx queues (of a VF) and PFs?

Thank you for your response