IpoIB transfer rate 70 mb/s......

Hello, i’m trying to create a home lab with infiniband hardware, here is my setup :

1 x Qlogic 12300-BS01 (Subnet manager integrated) flashed with latest intel firmware

1 x Poweredge r300 with mellanox hca MHQH19-XTC (flashed with latest mellanox firmware)

1 x Poweredge r300 with mellanox hca MHQH19-XTC (flashed with latest mellanox firmware)

On each server i installed esxi 5.5, hca drivers, ofed… etc. (i followed this guide Erik Bussink | InfiniBand install & config for vSphere 5.5) http://www.bussink.ch/?p=1306

On vcenter server i created a distribuited switch, assigned the 2 hca MHQH19-XTC and set the switch vmkernel mtu at 4096

I created 2 linux servers assigned as network card just the hca MHQH19-XTC (not the server network card), assigned private ip address (servers pinging eachother) and i tried to transfer 10GB file between the 2 server but the transfer speed its ridiculous not more than 70 Mb/s… somebody can help me to understand what’s wrong in my configuration ?

Thanks

is it faster if you connect the two machines directly without switch? Do you see errors on the link?

Could you test with something like mpi pingpong?

Infiniband HOWTO: OpenMPI http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-6.html

Hi,

It looks like your HW components are all QDR capable (you didn’t mention which cables you are using though).

My vote is also for a simple and plain Linux installation (give up on the VMware for now) and check performance then.

One thing people trip on many times - the server’s PCI bus capabilities: make sure your HCA cards are installed on a Gen2 or Gen3 PCI-e bus AND… that the BIOS has the correct settings to enable the latest PCI speed.

when testing your performance i would commander starting with low level RDMA performance tools for IB like rdma_read_bw

next, when you get to measure IP layers, don’t use SCP (it fragment the packets differently and has a significant caching effect). use industry standard benchmarks like iperf (my recommendation) or netperf.

good luck!!

Hello Michael

thanks for your reply… i simply used scp on centos 6.5… i don’t think that’s the storage limit, as storage im using a poweredge R510 with a Dell Perc H700 and 2 SSD Intel RAID0… the poweredge R510 its connected to the switch with a MHQH19-XTC

can you tell us how you measured- nfs, scp, samba? Did you try netperf? Could it be 70MB/s is your hard drive limiting?

Hello Michael

i tried to connect the 2 esxi hosts directly to the poweredge r510 storage… and suprise… the transfer rate its less… about 20 Mb/s

About OpenMPI i couldnt understand how to use it… there is any test that i can run into esxi 5.5 shell?

Thanks

Probably there is no such Softwarepackate like MPI for ESX. Why don’t you first install a regular Linux like Centos to debug if this performance is related to defective hardware

I suspected about the server’s PCI bus capabilities, so i replaced the poweredge r300 with a poweredge r710, installed the MHQH19-XTC into the x16 pci slot…

I runned some tests off esxi, so i installed on each server fedora 20, i run the iperf test and this is the results…

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-30.0 sec 2.58 GBytes 740 Mbits/sec

[ 4] 0.0-30.0 sec 2.68 GBytes 768 Mbits/sec

[ 5] 0.0-30.0 sec 2.51 GBytes 719 Mbits/sec

[ 6] 0.0-30.0 sec 2.70 GBytes 773 Mbits/sec

[ 8] 0.0-30.0 sec 2.68 GBytes 768 Mbits/sec

[ 10] 0.0-30.0 sec 2.77 GBytes 792 Mbits/sec

[ 7] 0.0-30.0 sec 2.60 GBytes 745 Mbits/sec

[ 9] 0.0-30.0 sec 2.48 GBytes 711 Mbits/sec

[SUM] 0.0-30.0 sec 21.0 GBytes 6.02 Gbits/sec

About the cables im using CINCH - QSPF+ (exactly this:

Cinch success at SC09: Showcased Cinch cables, Released QSFP+, and Participated in SCinet - Cinch Connectors http://www.cinch.com/sc09/