Priority flow control using DSCP as priority not working

I have a SN2410 (running Sonic) that’s connecting 2 ConnectX4, 100Gbit cards. I’ve successfully setup PFC using the VLAN PCP tag for indicating priority. Without PFC, ”ib_write_bw -a” would random slow from 11GB/s to 100 MB/s ! I also got PFC to work without a VLAN on the endpoints, only on the switch, which is simpler, but then, the default vlan tag will get assigned causing all traffic to have the same priority.

Now, I’m trying to use the DSCP field for priority. I think I’ve enabled it correctly on the end points. See mlnx_qos.txt. I’m not clear what DSCP value is used, so I’ve mapped both 0 & 26 to priority 3.

The random slowdowns are happening again. I have the usual symptoms:

  • rx_discards_phy increases on the receiver. From “ethtool -S enp194s0np0”
  • /sys/class/infiniband/mlx5_0/ports/1/hw_counters/out_of_sequence increases

The good news is the pause frames are being sent.

  • tx_prio3_pause on the receiver is going up
  • “show pfc counters” on the switch shows that it’s receiving pause frames, but not retransmitting them

So it seems the switch is not configured correctly. I don’t know if I set up TC_TO_PRIORITY_GROUP_MAP and DSCP_TO_TC_MAP in /etc/sonic/config_db.json correctly or might be missing other settings. Any ideas?

ib_write_benchmark.txt (5.4 KB)

mlnx_qos.txt (1.2 KB)

config_db.json.txt (66.3 KB)

Also, would DSCP be recommended over the other method (no vlan) that lumps all traffic into the same priority? I’d prefer not to set up a vlan. But it seems using DSCP, there is no way to set a default DSCP for all RoCEv2 packets without changing the source code, so it seems to have no benefit over the other method because all traffic would end up with the same priority?

Hi Joe,

The configs don’t look quite right to me either, you have DSCP val 00,26 mapped to prio 3 on the host, but on the switch I see “PORT_QOS_MAP” has “pfc_enable”: “0”, where I would expect it to be prio 3. I see you have other configs mapping DSCP to TC, TC to PG, and PFC is configured for TC 3. Try testing with different values in different locations in the config and see what the results are.

If the switch is receiving PFC frames they should be aligned with a single priority group, check that and ensure it’s being received on prio 3 as expected.

PFC frames are buffer-to-buffer, so we would only expect the switch to TX PFC frames if the switch buffer crosses the XON threshold.

We typically see DSCP over PCP for flexibility (RoCE & non-RoCE on the same vlan) and not to rely on the presence of a 802.1q tag that may or may not always be present, depending on the deployment & design.