My next step was to make QoS work with SR-IOV devices that were attached (through PCI passthrough) to KVM guests. I assumed configuring physical function should be enough and I started my guest (virtual machine) and tested the egress rate of a VLAN device (VLAN 2) that I created from a virtual function (SR-IOV device) inside the guest. Unfortunately the egress rate does not correspond to what the host (physical function) was able (configured) to achieve.
Observation:
Noticed that any VLAN index I use inside the guest, the bandwidth I measure does not correspond to that of the VLAN index in the host configuration, but VLAN index 0 always.
Example:
On host I created, configured and tested VLAN 2 to be 1Gbits/s and VLAN 0 to be 5Gbits/s.
On guest I created VLAN 2 and when observing the bandwidth I found I got 5 Gbits/s instead of 1 Gbits/s.
Could anyone point me in right direction on how to configure the SR-IOV devices for QoS properly, or is it even supported in my Mellanox adapter?
I would like to have multiple VLANs with each VLAN having their own QoSs inside VM. It would be nice to configure everthing inside the guest instead of doing anything at host at all. But I am okay with setting VLAN to QoS mappings in Host and send traffic with VLAN tags from inside VM and get the traffic throttled according to host settings.
We are not using RoCE.
I tried both VST and VGT:
VGT: With VGT, the problem I am facing is inside VM, I created vlan devices using vconfig and send traffic and the rate limit I get is that of VLAN0 on host. So configuring VLAN inside VM is not working. So rate limit of VLAN0 on host is all I can get.
VST: With VST, creating more than one VF(with each VFs having their own VLAN/QoS set on host) as you said, but that would complicate our design a bit and we are trying to keep it as the last resort.
If I could use VGT with multiple VLANs inside VM and get rate control that would be the ultimate thing I’m looking for.
Note: Documentation(Mellanox_OFED_Linux_User_Manual_v2.2-1.0.1.pdf) says VGT is default behaviour.
Thanks ophirmaor, setting the egress_map(mapping sk_prio → user_prio) inside the guest did the trick.
To add to knowledge base(although stated partly in documentation):
We don’t use vconfig, what I got to work was set the socket priority option and map that priority(sk_prio) to a user priority(user_prio) using tc/tc_wrap.py inside the guest and mapped those user priority to hardware priority(traffic class) and control the rate.