Vf lag offload

Hi,

I am trying to offload LAG in the case of VFs. I read on an official online document available on the Nvidia website that Mellanox supports three bonding modes that can be offloaded; Active-Backup, Balance-Xor, and 802.3ad. The link to the document is as follows:
https://docs.nvidia.com/networking/pages/releaseview.action?pageId=25133702

I tried to create this scenario on my setup. I used ConnectX-5 cards for this experimentation. I configured the bond in Xor mode and was trying to observe if the hash is calculated to select the interface in the bond when it is offloaded but it was observed that the hash was not calculated and a specific interface was selected for every packet.

I used TC rules and wrote the following script to create the setup:

# Cleanup
tc qdisc del dev bond0 ingress_block 22 ingress
tc qdisc del dev enp129s0f0 ingress_block 22 ingress
tc qdisc del dev enp129s0f1 ingress_block 22 ingress

echo 0000:81:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind 
echo 0000:81:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind

echo 0000:81:01.2 > /sys/bus/pci/drivers/mlx5_core/unbind 
echo 0000:81:01.3 > /sys/bus/pci/drivers/mlx5_core/unbind 

ip link del bond0
ip link set dev enp129s0f0 nomaster
ip link set dev enp129s0f1 nomaster

# Config
echo 0 > /sys/class/net/enp129s0f0/device/sriov_numvfs
echo 0 > /sys/class/net/enp129s0f1/device/sriov_numvfs
sleep 5

make -j99
rmmod mlx5_ib
rmmod mlx5_core
insmod ./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko


sleep 5 
echo 2 > /sys/class/net/enp129s0f0/device/sriov_numvfs
echo 2 > /sys/class/net/enp129s0f1/device/sriov_numvfs


ip link set enp129s0f0 vf 0 mac e4:1d:2d:fd:8b:01
ip link set enp129s0f0 vf 1 mac e4:1d:2d:fd:8b:02
ip link set enp129s0f1 vf 0 mac e4:1d:2d:fd:8b:03
ip link set enp129s0f1 vf 1 mac e4:1d:2d:fd:8b:04


echo 0000:81:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind 
echo 0000:81:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind

echo 0000:81:01.2 > /sys/bus/pci/drivers/mlx5_core/unbind 
echo 0000:81:01.3 > /sys/bus/pci/drivers/mlx5_core/unbind 
sleep 2

echo switchdev > /sys/class/net/enp129s0f0/compat/devlink/mode 
echo switchdev > /sys/class/net/enp129s0f1/compat/devlink/mode
sleep 5

ethtool -K enp129s0f0 hw-tc-offload on
ethtool -K enp129s0f0_0 hw-tc-offload on
ethtool -K enp129s0f0_1 hw-tc-offload on
ethtool -K enp129s0f1 hw-tc-offload on
ethtool -K enp129s0f1_0 hw-tc-offload on
ethtool -K enp129s0f1_1 hw-tc-offload on

sleep 2
ip link add name bond0 type bond mode balance-xor miimon 100

ip link set dev enp129s0f0 down
ip link set dev enp129s0f1 down

ip link set dev enp129s0f0 master bond0
ip link set dev enp129s0f1 master bond0

ip link set dev bond0 up
ip link set dev enp129s0f0 up
ip link set dev enp129s0f1 up

sleep 2
tc qdisc add dev bond0 ingress_block 22 ingress
tc qdisc add dev enp129s0f0 ingress_block 22 ingress
tc qdisc add dev enp129s0f1 ingress_block 22 ingress
tc qdisc add dev enp129s0f0_0 ingress


sleep 2
echo 0000:81:00.2 > /sys/bus/pci/drivers/mlx5_core/bind 
echo 0000:81:00.3 > /sys/bus/pci/drivers/mlx5_core/bind

echo 0000:81:01.2 > /sys/bus/pci/drivers/mlx5_core/bind 
echo 0000:81:01.3 > /sys/bus/pci/drivers/mlx5_core/bind


sleep 5

ifconfig enp129s0f0 up
ifconfig enp129s0f1 up
ifconfig enp129s0f0_0 up
ifconfig enp129s0f0_1 up
ifconfig enp129s0f1_0 up
ifconfig enp129s0f1_1 up
ifconfig enp129s0f0v0 up
ifconfig enp129s0f0v1 up
ifconfig enp129s0f1v0 up
ifconfig enp129s0f1v1 up


sleep 2
tc filter add block 22 protocol ip parent ffff: prio 1 flower dst_mac e4:1d:2d:fd:8b:01 action mirred egress redirect dev enp129s0f0_0
tc filter add block 22 protocol arp parent ffff: prio 2 flower dst_mac e4:1d:2d:fd:8b:01 action mirred egress redirect dev enp129s0f0_0
tc filter add block 22 protocol arp parent ffff: prio 3 flower dst_mac ff:ff:ff:ff:ff:ff action mirred egress redirect dev enp129s0f0_0
tc filter add dev enp129s0f0_0 protocol all parent ffff: prio 2 flower action mirred egress redirect dev bond0

All of the rules were offloaded as I checked it using “tc filter show dev ingress” command. Even the last rule in which the packet is directed from VF Representor to bond0 was offloaded; even though bond0 is not a Mellanox interface. So, I guess it was translated to another rule which was eligible to get offloaded. But regardless of the rule, I did not see an interface selected on the hash as it should be in the XOR mode, and always a specific interface was selected no matter the packet. And, until the wire was removed from that interface, it kept on sending packets in this case. I even downed the interface and checked but the packets were still being transmitted using that same specific interface. I checked this using “ethtool -S ” and monitored the physical counters of both the interfaces. In this case, I should also mention that I was sending packets from VF Rep so that offloaded rules would get hit.

I also sent packets directly from the bond and in this case as it was up to the Linux bonding driver to select the interface and none of the offloading rules were hit in this case then the interface was selected based on the hash policy.

I used the below-given command to check if the offloaded rule against the VF Rep is being hit:
"watch -d -n 1 -p “tc -s filter show dev enp129s0f0_0 ingress | grep ‘Sent hardware’”

Can you tell if I am doing something wrong or, why the hash is not being calculated in the case of VF Lag Offload?

By default bond mode 2 base on MAC address to hash. If you test from one same client then one port select is expection.

Bond Mode 2 – Balance XOR
In a balance XOR bond mode the bond will evaluate the source and destination mac addresses to determine which interface to send the network packets out. This method will pick the same interface for a given mac address and as a result is capable of load balancing and fault tolerance.

And another issue is Switch.

connected to the same switch,you need to configure static aggregation on the switch.

switch iss a Layer 2 device, it will records the mapping between MAC addresses and ports , one MAC address can only be mapped to one port at a time. mode 2, all NICs under bond0 share same MAC address.

If your switch supports LACP, consider using mode 4.

Or try xmit_hash_policy=layer3+4

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.