Hi,
I am trying to offload LAG in the case of VFs. I read on an official online document available on the Nvidia website that Mellanox supports three bonding modes that can be offloaded; Active-Backup, Balance-Xor, and 802.3ad. The link to the document is as follows:
https://docs.nvidia.com/networking/pages/releaseview.action?pageId=25133702
I tried to create this scenario on my setup. I used ConnectX-5 cards for this experimentation. I configured the bond in Xor mode and was trying to observe if the hash is calculated to select the interface in the bond when it is offloaded but it was observed that the hash was not calculated and a specific interface was selected for every packet.
I used TC rules and wrote the following script to create the setup:
# Cleanup
tc qdisc del dev bond0 ingress_block 22 ingress
tc qdisc del dev enp129s0f0 ingress_block 22 ingress
tc qdisc del dev enp129s0f1 ingress_block 22 ingress
echo 0000:81:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:81:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:81:01.2 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:81:01.3 > /sys/bus/pci/drivers/mlx5_core/unbind
ip link del bond0
ip link set dev enp129s0f0 nomaster
ip link set dev enp129s0f1 nomaster
# Config
echo 0 > /sys/class/net/enp129s0f0/device/sriov_numvfs
echo 0 > /sys/class/net/enp129s0f1/device/sriov_numvfs
sleep 5
make -j99
rmmod mlx5_ib
rmmod mlx5_core
insmod ./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
sleep 5
echo 2 > /sys/class/net/enp129s0f0/device/sriov_numvfs
echo 2 > /sys/class/net/enp129s0f1/device/sriov_numvfs
ip link set enp129s0f0 vf 0 mac e4:1d:2d:fd:8b:01
ip link set enp129s0f0 vf 1 mac e4:1d:2d:fd:8b:02
ip link set enp129s0f1 vf 0 mac e4:1d:2d:fd:8b:03
ip link set enp129s0f1 vf 1 mac e4:1d:2d:fd:8b:04
echo 0000:81:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:81:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:81:01.2 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:81:01.3 > /sys/bus/pci/drivers/mlx5_core/unbind
sleep 2
echo switchdev > /sys/class/net/enp129s0f0/compat/devlink/mode
echo switchdev > /sys/class/net/enp129s0f1/compat/devlink/mode
sleep 5
ethtool -K enp129s0f0 hw-tc-offload on
ethtool -K enp129s0f0_0 hw-tc-offload on
ethtool -K enp129s0f0_1 hw-tc-offload on
ethtool -K enp129s0f1 hw-tc-offload on
ethtool -K enp129s0f1_0 hw-tc-offload on
ethtool -K enp129s0f1_1 hw-tc-offload on
sleep 2
ip link add name bond0 type bond mode balance-xor miimon 100
ip link set dev enp129s0f0 down
ip link set dev enp129s0f1 down
ip link set dev enp129s0f0 master bond0
ip link set dev enp129s0f1 master bond0
ip link set dev bond0 up
ip link set dev enp129s0f0 up
ip link set dev enp129s0f1 up
sleep 2
tc qdisc add dev bond0 ingress_block 22 ingress
tc qdisc add dev enp129s0f0 ingress_block 22 ingress
tc qdisc add dev enp129s0f1 ingress_block 22 ingress
tc qdisc add dev enp129s0f0_0 ingress
sleep 2
echo 0000:81:00.2 > /sys/bus/pci/drivers/mlx5_core/bind
echo 0000:81:00.3 > /sys/bus/pci/drivers/mlx5_core/bind
echo 0000:81:01.2 > /sys/bus/pci/drivers/mlx5_core/bind
echo 0000:81:01.3 > /sys/bus/pci/drivers/mlx5_core/bind
sleep 5
ifconfig enp129s0f0 up
ifconfig enp129s0f1 up
ifconfig enp129s0f0_0 up
ifconfig enp129s0f0_1 up
ifconfig enp129s0f1_0 up
ifconfig enp129s0f1_1 up
ifconfig enp129s0f0v0 up
ifconfig enp129s0f0v1 up
ifconfig enp129s0f1v0 up
ifconfig enp129s0f1v1 up
sleep 2
tc filter add block 22 protocol ip parent ffff: prio 1 flower dst_mac e4:1d:2d:fd:8b:01 action mirred egress redirect dev enp129s0f0_0
tc filter add block 22 protocol arp parent ffff: prio 2 flower dst_mac e4:1d:2d:fd:8b:01 action mirred egress redirect dev enp129s0f0_0
tc filter add block 22 protocol arp parent ffff: prio 3 flower dst_mac ff:ff:ff:ff:ff:ff action mirred egress redirect dev enp129s0f0_0
tc filter add dev enp129s0f0_0 protocol all parent ffff: prio 2 flower action mirred egress redirect dev bond0
All of the rules were offloaded as I checked it using “tc filter show dev ingress” command. Even the last rule in which the packet is directed from VF Representor to bond0 was offloaded; even though bond0 is not a Mellanox interface. So, I guess it was translated to another rule which was eligible to get offloaded. But regardless of the rule, I did not see an interface selected on the hash as it should be in the XOR mode, and always a specific interface was selected no matter the packet. And, until the wire was removed from that interface, it kept on sending packets in this case. I even downed the interface and checked but the packets were still being transmitted using that same specific interface. I checked this using “ethtool -S ” and monitored the physical counters of both the interfaces. In this case, I should also mention that I was sending packets from VF Rep so that offloaded rules would get hit.
I also sent packets directly from the bond and in this case as it was up to the Linux bonding driver to select the interface and none of the offloading rules were hit in this case then the interface was selected based on the hash policy.
I used the below-given command to check if the offloaded rule against the VF Rep is being hit:
"watch -d -n 1 -p “tc -s filter show dev enp129s0f0_0 ingress | grep ‘Sent hardware’”
Can you tell if I am doing something wrong or, why the hash is not being calculated in the case of VF Lag Offload?