RoCEv2 PFC/ECN Issues

robert74 · August 4, 2018, 12:38am

We have two servers with ConnectX-4 100Ge cards and two Cisco C3232C switches with routing between them and are trying to get RoCEv2 routing through with PFC/ECN to provide the best performance during periods of congestion.

The funny thing is using base configuration with no other servers on the switches, we get terrible performance (1.6 Gbps) across the routed link using iSER when we are only using about 20 Gbps (1 iSER connection and test workload configuration). By using multiple iSER connections and PFC, we can get about 95 Gbps, so we know that the hardware is capable of the performance in routing mode. We can’t understand why in the default case the performance is so bad. The fio test shows that a lot of IO happens, then there is none and it just cycles back and forth.

We would like to use both PFC and ECN for our configuration, but we are trying to validate that ECN will work without PFC and when we disable PFC, we can’t test ECN most likely because of the above issue.

On the Cisco switches, we have policy maps that places our traffic with the DSCP markings into a group that has ECN enabled (I’m not a Cisco person, so I may not be getting the terminology quite right) and we can see the group counters on the Cisco incrementing. We don’t ever see any packets marked with congestion, probably because the switch never sees any due to the above problem.

When we have the client set to 40 Gbps and do a read test with PFC, we get pause frames and great performance. We have the Cisco switches match the DSCP value and remark the COS for packets that traverse the router (interesting enough Cisco sends PFC pause frames on the routed link even though there are no VLANs configured. We captured it in wireshark, but with the adapters set to --trust=pcp, the performance in terrible, but --trust=dscp works well). The Cisco switches also show pause frame counters incrementing when we are 100g end to end. I’m not sure why it would be incrementing when there is no congestion.

We have done so many permutations of tests, that I may be getting fuzzy in some details. Here is a matrix of some tests that I can be sure of. This is all 100g end to end.

switch PFC mode (ports)trust modepfc prio 3 enabledskprio → cos mappingResultstatic on/offmlnx_qos --trust=Xmlnx_qos --pfc=0,0,0,X,0,0,0,0ip link set rsY.Z type vlan egress 2:3onpcpyesyesGoodonpcpyesnoGoodonpcpnoyesBadonpcpnonoBadondscpyesyesGoodondscpyesnoGoodondscpnoyesBadondscpnonoBadoffpcpyesyesBadoffpcpyesnoBadoffpcpnoyesBadoffpcpnonoBadoffdscpyesyesBadoffdscpyesnoBadoffdscpnoyesBadoffdscpnonoBad

We are using OFED 4.4-1.0.0.0 on both nodes, one is CentOS 7.3, the other CentOS 7.4, running 4.9.116 and the firmware is 12.23.1000 on one card and 12.23.1020 on the other. In addition to the above matrix, we have only changed:

echo 26 > /sys/class/net/rs8bp2/ecn/roce_np/cnp_dscp

echo 106 > /sys/kernel/config/rdma_cm/mlx5_3/ports/1/default_roce_tos

If you have any ideas that we can try, we would appreciate it.

Thank you.

alekseys1 · September 28, 2018, 8:26pm

What happens when you run ib_read_bw test?

alekseys1 · October 3, 2018, 6:45pm

Hi Robert,

Please, follow this link - Recommended Network Configuration Examples for RoCE Deployment https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment - to configure your the host and the switch. When using non-Mellanox switch, check with switch vendor what are the corresponding commands.

Topic		Replies	Views
[PFC+CC doesn't work] Enabling PFC disables DCQCN InfiniBand/VPI Adapter Cards mlxconfig , understanding-rocev2-congestion-man , mlnx_qos	13	750	May 20, 2024
Performance test with RoCEv2 Mellanox OFED	3	2441	February 20, 2023
PFC with ConnectX-5 Ethernet Adapter Cards	2	744	November 30, 2017
Setting service level in ib_send_bw doesn't increase expected priority counters in ethtool Ethernet Adapter Cards	1	1173	August 25, 2022
Bad RoCEv2 throughput with ConnectX-5 Ethernet Adapter Cards	0	401	October 2, 2018
Connectx-6 DX card sending CNP even when there is no ECN marked ROCE traffic from switches Ethernet Adapter Cards	7	1395	March 25, 2025
Read port priority counters is 0 . InfiniBand/VPI Switch Systems iterations , bytes	5	698	November 2, 2015
RoCE testing between virtual and physical machines Ethernet Switches cumulus-linux , rdma-and-roce , infiniband , roce	9	1551	January 19, 2024
ConnectX-4 LX RoCE does not like latency Mellanox OFED	1	400	March 7, 2017
Poor bandwidth performance when running with large block size Ethernet Adapter Cards iterations , bytes	9	1659	May 28, 2018

RoCEv2 PFC/ECN Issues

Related topics