Dear Mellanox community
we observe random delays on two back-to-back connected systems with ConnectX-5 2x100G cards
example
(long stretch of normal delay; excessive delay 10-100 ms, sometimes lasting very short, sometimes for minutes; again normal delay,no losses)
[1630922510.721460] 64 bytes from 192.168.1.2: icmp_seq=36406 ttl=64 time=0.025 ms
[1630922510.737460] 64 bytes from 192.168.1.2: icmp_seq=36407 ttl=64 time=0.024 ms
[1630922510.783855] 64 bytes from 192.168.1.2: icmp_seq=36408 ttl=64 time=30.4 ms
[1630922510.783861] 64 bytes from 192.168.1.2: icmp_seq=36409 ttl=64 time=14.4 ms
[1630922510.783887] 64 bytes from 192.168.1.2: icmp_seq=36410 ttl=64 time=0.018 ms
[1630922510.797453] 64 bytes from 192.168.1.2: icmp_seq=36411 ttl=64 time=0.021 ms
system: Supermicro AS-1113S-WN10RT, 256GB RAM
CPU: AMD EPYC 7702P 64-Core
NICs MT27800 Family [ConnectX-5 2x100G], 2 cards
cable: 100G-CR4 MLX (DAC)
driver: mlx5_core
version: 5.4-1.0.3 (newest as of writing this)
firmware-version: 16.31.1014 (MT_0000000012) (newest as of writing this)
kernel/OS: 5.4.0-81-generic #91~18.04.1-Ubuntu SMP (newest as of writing this, for ubuntu 18.04 LTS HWE)
CPU no lower than C1 power state
storage 2x 8TB samsung:nvme:PM1733:2.5 + 2x 256G Micron_2200 (RAID1, boot)
The systems host OSDs for a very lightly loaded test ceph cluster, (storage access + storage replication is of order of 50-100Mb/s currently), CPU 0-1%, 10VMs connecting to it from external hypervisior hosts, mostly idle)
One NIC is connected back-to-back to the other host via plain L2/L3 (no vlan, no bonding, default parameters) for the purpose of this test and we see delays (also) there.
The other ConnectX-5 card (actually used for ceph) is connected to a pair of dell S4048 in VLT/LACP mode (which then connects via eVLT to another rack with the same config). NICs neogtiate 40G which bacause of dellS4048-ON offers 40G only.
We observe the excessive delay to happen on both cards (ceph and back to back), more or less at the same time, so we believe we can take the switches and the bonding out of the equation for now.
kern.log, syslog, journalctl do not show anything around the time of the event (and no other errors during boot etc.)
topology (fixed font)
host nvme01 host nvme02
enp129s0f0----enp129s0f0 (back to back connection, we also see excessive delay here)
enp129s0f1----enp129s0f1
host nvme01 host nvme02
enp129s0f0 (bond0) — (vlt) dell4048-01 --evlt-- dell 4048-03 (vlt) – (bond0) enp129s0f0
enp129s0f1 (bond0) — (vlt) dell4048-02 --evlt-- dell 4048-04 (vlt) – (bond0)enp129s0f0
Thank you and regards
Piotrek Z