the data source is an FPGA system with 100gb MAC IP (xilinx cmac). RoCEv2 packets (UC) are correctly generated on the fly.
Receiver is a connectx-5, connected directly by fiber (no switch)
- it works well with one queue pair at full throughput
- it works well with 2 queue pairs, once at a time, at full throughput
- it does not work if both queue pairs are used concurrently (datagram interleaved). More precisely the throughput is very bad but data not lost and we observe a high number of global pause requests. The fpga design is reputed managing global pause. Packet capture with tcpdump shows increased delay between frames. None are lost (psn is as expected). With a single QP, there is a considerably smaller delay
Any hints on this subject is welcome…