Questions about performance degradation caused by duplicate atomic request

I have a network function, which will duplicate atomic packet in switch. when I duplicate atomic packet, the latency becomes about 3 times larger(from 4.85us to 15.47us, test with perftest ib_atomic_lat, two cx7 connected by one Ethernet switch), there’s no rto or out of sequence. If I use ib_write_lat to do the same test and duplicate write packet, the latency will not increase.
I find it confused, because duplicate atomic packets should have been dropped by the NIC as soon as their PSNs are recognized as duplicates, before they reach the processing pipeline. Why can they still hurt performance? What’s the difference of processing duplicated atomic and write only pakcets.

This behavior is expected because atomic operations require additional processing on the target RNIC to guarantee atomicity and return a response, which introduces serialization and queuing delays. In contrast, write packets are one-way and can be dropped earlier in the hardware pipeline when recognized as duplicates. It appears that duplicate detection for atomic packets occurs later in the NIC processing path, so duplicates still consume processing resources and impact latency even without retransmissions or out-of-sequence events.

1 Like

Thank you for your response!

I believe this might be a potential bug or security threat. For example, common packet replication attacks could be exploited to significantly degrade application performance(in our specific scenario, we’ve observed up to an 90% application performance drop simply by replicating AMO packets).

Would it be possible to address this in a future firmware update? For example, could the PSN validation for atomic packets be placed before the atomicity guarantee process?

I’m doing a security verification scenario and I see the same problem. When the Atom packet(opcode=20) is duplicated twice in network and sent to the receive side ConnectX-7, the communication performance deteriorates. However, the performance is not affected when the SEND is repeatedly or WRITE repeatedly.

According to the RDMA specifications, if duplicate PSNs are detected and invalid, no subsequent atomic semantic process should be performed. The performance should be the same as that of SEND/WRITE.

I’ve been grappling with this issue for quite some time and would be grateful for your guidance. Could you share whether this might be addressed in a future release, or perhaps suggest any recommended workarounds? Any insights you could offer would be greatly appreciated.