ConnectX-5 RoCE UC receive latency anomaly

I have a RoCE latency test working where a Connect X5 is used to receive RoCEv2 (IPv4 based) Unreliable Connection/UC traffic consisting solely of RDMA Write only UC frames.
As long as the sequence number (PSN) is in order, the measured roundtrip lateny in our setup is low (<2 microseconds). So far so good.

As soon as the PSN is not in order (e.g. a frame is lost somewhere), the network card (once per sequence “jump”) has an added latency of almost 20 microseconds. It looks like whenever the sequence number is not as expected, there is a “software exception handler” taking over, adding massive latency.

The IB/RoCE spec says that in UC mode any PSN must be accepted, which in a way is done, just not fast. Can this behaviour somehow be changed/influenced? E.g. a setting?

Thank you!
Thomas

Hi,

ConnectX‑5 UC RoCEv2 will accept any PSN, but when it sees a “jump” (missing/out‑of‑order PSN) it goes through a slowpath in the NIC once per gap, adding about 20 µs latency, and this behavior is a hardware limitation that doesn’t have a tunable setting. The only practical options are to avoid PSN gaps (keep traffic in order) or change transport (e.g., RC or something without connected‑QP PSNs).

If there are production contratints that make this behavior unacceptable or problematic, please feel free to open a case with NVIDIA Enterprise Support.

Thanks,

Jonathan.