I’m trying to get RoCE v1 working with ConnectX-5 100G Ethernet adapters. I have ib_send_bw working with good bandwidth, but things seem to fall apart with OpenMPI jobs with multiple MPI tasks per node, most certainly because I don’t have flow control working properly yet. These adapters use the mlx5 drivers so it doesn’t appear that the mlx4_en kernel module options are available (pfctx/pfcrx).
I’m at a loss how to make progress. If I try to configure things manually then mlx_qos and ethtool seem to wipe out the effect of the other:
[me@mine]# mlnx_qos -i eth4 -f 1,1,1,1,1,1,1,1
PFC configuration:
priority 0 1 2 3 4 5 6 7
enabled 1 1 1 1 1 1 1 1
tc: 0 ratelimit: unlimited, tsa: vendor
priority: 1
tc: 1 ratelimit: unlimited, tsa: vendor
priority: 0
tc: 2 ratelimit: unlimited, tsa: vendor
priority: 2
tc: 3 ratelimit: unlimited, tsa: vendor
priority: 3
tc: 4 ratelimit: unlimited, tsa: vendor
priority: 4
tc: 5 ratelimit: unlimited, tsa: vendor
priority: 5
tc: 6 ratelimit: unlimited, tsa: vendor
priority: 6
tc: 7 ratelimit: unlimited, tsa: vendor
priority: 7
[me@mine]# ethtool -A eth4 rx on
[me@mine]# ethtool -A eth4 tx on
[me@mine]# ethtool -a eth4
Pause parameters for eth4:
Autonegotiate: off
RX: on
TX: on
[me@mine]# mlnx_qos -i eth4
PFC configuration:
priority 0 1 2 3 4 5 6 7
enabled 0 0 0 0 0 0 0 0
tc: 0 ratelimit: unlimited, tsa: vendor
priority: 1
tc: 1 ratelimit: unlimited, tsa: vendor
priority: 0
tc: 2 ratelimit: unlimited, tsa: vendor
priority: 2
tc: 3 ratelimit: unlimited, tsa: vendor
priority: 3
tc: 4 ratelimit: unlimited, tsa: vendor
priority: 4
tc: 5 ratelimit: unlimited, tsa: vendor
priority: 5
tc: 6 ratelimit: unlimited, tsa: vendor
priority: 6
tc: 7 ratelimit: unlimited, tsa: vendor
priority: 7
[me@mine]# mlnx_qos -i eth4 -f 1,1,1,1,1,1,1,1
PFC configuration:
priority 0 1 2 3 4 5 6 7
enabled 1 1 1 1 1 1 1 1
tc: 0 ratelimit: unlimited, tsa: vendor
priority: 1
tc: 1 ratelimit: unlimited, tsa: vendor
priority: 0
tc: 2 ratelimit: unlimited, tsa: vendor
priority: 2
tc: 3 ratelimit: unlimited, tsa: vendor
priority: 3
tc: 4 ratelimit: unlimited, tsa: vendor
priority: 4
tc: 5 ratelimit: unlimited, tsa: vendor
priority: 5
tc: 6 ratelimit: unlimited, tsa: vendor
priority: 6
tc: 7 ratelimit: unlimited, tsa: vendor
priority: 7
[me@mine]# ethtool -a eth4
Pause parameters for eth4:
Autonegotiate: off
RX: off
TX: off