Community.mellanox.com

Congestion Handling modes for multi host in ConnectX-4 Lx

In multihost, due to the narrow PCIe interface vs. the wide physical port interface, when a burst of traffic to one host might fill up the PCIe buffer. This might cause filling of the receive buffer, degradation to other hosts performance and drops in the shared RX buffer

In ConnectX-4 LX we can enable a HW mechanism to monitor the amount of PCIe buffer consumed per host. Once a host is consuming more than a certain amount of buffer, traffic to that host is dropped or marked with CE in the IP TOS bits.

PCIe buffer utilization and drop/marking decision is done according to WRED (Weighted Random Early Discard) scheme.

In addition, the HW can be configured to work in aggressive mode where once PCIe buffer was consumed packets will be discard/mark or dynamic mode which is more relaxed and taking into account the port receive buffers.

Modes in WRED

WRED mechanism has 4 congestion handling modes that are supported on ConnectX-4 Lx.

CPU Utilization is approximately the same in all selected modes

The modes are described below :

  • Aggressive Drop: Hardware drops traffic according to PCIe buffer occupancy alone
  • Dynamic Drop: Hardware drops traffic according to both Rx buffer and PCIe buffer occupancy
  • Dynamic Mark and Aggressive Mark : Mark IP Header ECN bits as CE (Congestion Experienced) and allow OS Congestion Control handling. Aggressive Mark mode and Dynamic Mark mode are toggled via driver commands . Mark ECN (Explicit Congestion Notification) bits on IP packet when buffer is congested (RFC 3168). Switch ECN configurations are not a must. Switch can expedite ECN markings before congestion reaches the host

How to configure it

WRED modes can be configured using opensource mstflint tool (mstflint v4.10.0-3 or later available at https://github.com/Mellanox/mstflint) on CX-4 Lx: FW 14.23.1020 or later.

Usage:

# mstcongestion [option] [-d|–device ] [–mode ] [–action ] [-q|–query] [-h|–help] [-v|–version] -d|

Params

–device Mellanox PCI device address

–mode Set Mode, options are: [aggressive | dynamic]

–action Set Action, options are: [disabled | drop | mark] Note: The “mark” option is available only if the driver supports such capability.

-q|–query Query congestion

-h|–help Show help message and exit

-v|–version Show version and exit

Example:

mstcongestion -d 02:00.0 --mode dynamic

mstcongestion -d 02:00.0 -–action drop

To enable ECN for Dynamic Mark:

”mark” action requires ECN mark support in Linux driver and enablement by ‘sysctl -w net.ipv4.tcp_ecn=1’

Monitoring WRED:

Counters in ethtool : Counters are per host

  • Marking rx_ecn_mark
  • Dropping outbound_pci_buffer_overflow

More information about ethtool counters can be found here: https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters