Hi,
I’m trying to verify – and potentially change – the Priority Flow Control (PFC) priority used by a ConnectX-5 NIC (MCX515A-CCA_Ax_Bx, firmware 16.31.1014). My goal is to ensure that traffic is sent using priority 3, which is what my switch is advertising via DCBX. I can just as well change the switch config but I would prefer to first understand what is happening on the host side before debugging the switch further.
On the host (AlmaLinux 9.4, kernel 5.14.0-427), I can confirm that the NIC appears to be enabling PFC for priority 3 as per dcbx, based on the output of mlnx_qos:
[root@host ~]# mlnx_qos -i interface0|grep -P ‘^PFC|\s+(priority|enabled|buffer)\s{2,}’
PFC configuration:
priority 0 1 2 3 4 5 6 7
enabled 0 0 0 1 0 0 0 0
buffer 0 0 0 1 0 0 0 0
However, if I run something like ib_send_bw over the said interface and capture traffic using tcpdump -i mlx5_0 -ee -vv i can see that the traffic is tagged with priority 0 (p 0 appears after the vlan tag). To me this would seem to suggest that the traffic is not mapped to priority 3 as I had hoped. Additionally, ethtool -S counters show that traffic would indeed map to priority 0.
Any insights on this are welcome.
–
Vesa
Hi Vesa,
In your ib_wirte_bw command you should use --tclass=96.
Tclass 96 corresponds to DSCP value of 24 (upper 6 bits) which corresponds to priority 3. Since you have not used it the priority chosen is 0.
Also, verify you are using trust DSCP (should be part of the mlnx_qos output).
Regards,
Yaniv
Thanks for your suggestion—I appreciate you taking the time to respond.
If possible, I’d prefer to work with 802.1p priorities rather than DSCP. Would it be correct to assume that this should work as long as the “priority trust state” from “mlnx_qos” is set to “pcp”? This is how the NICs are set at the moment and it also matches to what i’ve configured on the switch.
Also, do you happen to have information on the default priority used for RoCEv2 and how to verify this? From what I can tell via tcpdump, it appears to be 0, but that seems a bit counterintuitive, as 0 typically indicates “best effort,” whereas something like 3 would usually correspond to “critical applications.”. This leads me to think there may be something off in our current NIC configuration. This information would also be helpful because I’m not sure the custom applications we’re running expose a setting similar to the --tclass of ib_send_bw.
Any insights you might have would be greatly appreciated.
–
Vesa
Hi Vesa,
Unfortunately the PCP method is more complex and this is the reason we moved to DSCP mapping. The configuration is prune to errors. On top of that DSCP is routable as it marked in the IP header.
In general you will have to map the user priority to skprio, … On top of that you need to setup vlan header for your interface. You can try to dig the details from this post: EnterpriseSupport
The default was kept for backward compatibility as priority 0 but in our guidance we usually refer to priority 3 as RoCE although you can map any priority you want.
I can add that most customers are using DSCP mode (last one that I heard using PCP was around 5 years ago).
Regards,
Yaniv