Hi Yuhongzhang,
I did not check the code but rather expect it to be correct.
I do want to bring to you attention a feature that we have enabled that might interfere with your experiment. It is called Adaptive Retransmission Timers. Under the hood it will try to retransmit faster to reduce the time till retransmission. By doing so, in case we do have lossy network (enabled also in lossless network from what i recall) we will recover faster and increase overall performance. The internal timer will not exceed user defined timer and will try to track the network RTT.
You can review / enable / disable this feature. Please review my post [*] for roce_adp_retrans_en.
Regards,
Yaniv
[*] https://enterprise-support.nvidia.com/s/article/How-to-Enable-Disable-Lossy-RoCE-Accelerations