Is an p2p (dedicated link, without switch) Fibre connexion totally lossless ?

Dear all,

I am doing RDMA transfer using connectx-5 100GbE Fibre with RoCEv2 (UD unreliable datagram Send) between two servers (<10 meters)

data size is around 8 GBytes or 80GB during tests

some time everything is fine and I dont have packet drop

but I also have frequently a low number (around 0.01%) packet silently loss (nothing visible with verbs api neither dmesg or sysfs)

I am sure that some packet are dropped because I use RDMA_SEND_WITH_IMM verb with a pkt number that is checked while polling RWQ on destination host. Application is a loop that continuously post 15360 work request (3072bytes length) at once

This is not related with completion queue overrun (they are polled).

I pay attention to cpu affinity, I try to put some amount of nanosleep on source host between ibv_post_send,I also set Ring parameters to max (8192). I suspected some transceiver temperature issue and try with another 40GbE copper link, and I have same issue

My question : are some (very few number but not zero) packet loss unavoidable ?

cheers

Hi Raphael,

RoCE v2 is a UDP based protocol, and UDP, unfortunately, does not guarantee delivery, ordering or duplicate protection of the packets.

In the case if you have an additional programming questions, I would suggest to ask the question on linux-rdma mailing list.