Slow File Transfer On 20Gbps IB

Dear All,

I am new in Infiniband devices. I bought Mellanox 2 pieces of Connectx-2 (20Gbps) from ebay and installed them on 2 debian servers (PCIE3 8 lanes) with no problem. I had got 15Gbps measured with iperf3 as follow:

iperf3 -c 10.20.0.34

Connecting to host 10.20.0.34, port 5201

[ 4] local 10.20.0.35 port 58208 connected to 10.20.0.34 port 5201

[ ID] Interval Transfer Bandwidth Retr Cwnd

[ 4] 0.00-1.00 sec 1.85 GBytes 15.9 Gbits/sec 0 11.9 MBytes

[ 4] 1.00-2.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 2.00-3.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 3.00-4.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 4.00-5.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 5.00-6.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 6.00-7.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 7.00-8.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 8.00-9.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 9.00-10.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes


[ ID] Interval Transfer Bandwidth Retr

[ 4] 0.00-10.00 sec 18.2 GBytes 15.6 Gbits/sec 0 sender

[ 4] 0.00-10.00 sec 18.2 GBytes 15.6 Gbits/sec receiver

But, why do I only get 150MB/s (about 1.2Gbps) while transfer a large file (3.5GB) via SCP and RSYNC?

I think no problem with disk I/O because I transfer from and to ramdisk.

I appreciate your helps. Thank you very much.

According to you, the issue somewhere in OS (I/O, memory allocation, other) and not in the network. 20Gbps on ConnectX-2 will give you maximum theoretical 16 Gbps because of 8/10 encoding, so 15.6 Gbps is pretty close.

I would suggest to use perf to analyze ssh/rsync behaviour, or maybe ‘strace -ttt -T’ option in order to see how much time it spends in the system calls

According to you, the issue somewhere in OS (I/O, memory allocation, other) and not in the network. 20Gbps on ConnectX-2 will give you maximum theoretical 16 Gbps because of 8/10 encoding, so 15.6 Gbps is pretty close.

I would suggest to use perf to analyze ssh/rsync behaviour, or maybe ‘strace -ttt -T’ option in order to see how much time it spends in the system calls