Evaluating RDMA communication performance using the DOCA library

■ Introduction
While evaluating RDMA communication performance using the DOCA library, we encountered an issue where the expected bandwidth could not be achieved when operating in DPU mode. If anyone has experience with a similar setup or insights into this matter, your input would be greatly appreciated.

■ Environment Details
NIC Used: BlueField-3 B3220L SuperNICs
PCIe Bus Configuration: Gen5 x16
DPU Memory Configuration: 10 × DDR5 (64bit + 8bit ECC), total 16GB @ 5200MT/s, single-channel
RDMA Execution Setup:
Sender: SmartNIC ConnectX-6 Dx
Receiver: BlueField-3 B3220L (operating in DPU mode)
■ Issue Summary
Based on the following DOCA RDMA sample, we implemented RDMA bandwidth measurement along with a task that continuously processes approximately 2GB of data (CH1/CH2). We also added synchronization logic to run CH1 and CH2 concurrently.

Reference Sample: DOCA RDMA Send and Receive

The observed bandwidth results were as follows:

CH1 only: ~91Gbps
CH1 + CH2 concurrently: ~60Gbps per channel, total ~120Gbps
Notably, when using BlueField in NIC mode, we were able to achieve the expected total bandwidth of approximately 180Gbps.

■ Questions
Is it correct to expect a total bandwidth of around 180Gbps with the above hardware configuration when running CH1 and CH2 concurrently?
Are there any additional settings or considerations required when operating in DPU mode to achieve optimal performance?
■ Closing
If you have experience with a similar setup or know of any best practices for optimizing performance in DPU mode, I would greatly appreciate your insights. Thank you in advance for your support!