Hi,
We are using a ConnectX-5 100Gb/s
adapter with the Linux kernel driver
running a program using express data
path (XDP) on ubuntu 20. These are the
driver details printed by ethtool -i
.
driver: mlx5_core
version: 5.19.0-38-generic
firmware-version: 16.34.1002 (MT_0000000011)
expansion-rom-version:
bus-info: 0000:ca:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
We are running some packet processing
throughput benchmarks. We’re able to use
a traffic generator (t-rex) connected
through a single cable with the CX-5
interface on the device under test
(DUT).
A simple packet-forwarding XDP program
on the DUT forwards 86 million packets
per second (Mpps) for 64 byte packets
with 14 cores. However, the true limit
on packets/sec to hit 100 Gbit/s is 148
Mpps. Our traffic generator runs on an
identical machine configuration, and
using DPDK can indeed hit this 148 mpps
limit.
We suspect that the limitation in
throughput arises from the combination
of PCIe and driver.
We came to know that the rx_discards_phy
counter printed by ethtool
indicates when there are drops from the
physical layer due to backpressure from
PCIe during receive operations on the
NIC. The problem is that this counter
increments both when the bottleneck is at
the PCIe and when we are CPU
bottlenecked (e.g. when we use a small
number of cores on our XDP device).
Is there any counter that directly
indicates a bottleneck in the PCIe (but
not CPU or some third component)?
Thanks for any help in advance,
Srinivas Narayana