OS - RHEL Centos 7.9 Latest
Sending 500MB chunks 21 times from one System to another connected via Mellanox Cables.
(Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6])
The registered memory region (500MB) is reused for all the 21 iterations.
The gain in Message Send Bandwidth when using aligned_alloc() (with system page size 4096B) instead of malloc() for registered memory is around 35Gbps.
with malloc() : ~86Gbps
with aligned_alloc() : ~121Gbps
Since the CPU is not involved for these operations, how is this operation faster with aligned memory?
Please provide useful reference links if available that explains this.
What change does aligned memory bring to the read/write operations?