Hello everyone,
I’m currently networking together a small collection of identical computers in order to work on solving CFD, chemical reaction, and coupled physics problems–as a hobby. My idea is to utilize older and/or commonly available cheap hardware in order to build a decently powerful and easily scalable compute cluster.
I’m not going to lie, the more feeble attempts I make at learning about older Infiniband hardware, the more I realize that I have no f****** idea what I’m doing.
At the moment (and embarrassingly) I’m running Windows Server 2012 R2 as I’m very familiar with it professionally, it comes with a number of built in tools for managing multiple computers, and most importantly, the engineering software I’m using at the moment is only licensed for Windows platforms (don’t ask me why). Also, I’m a complete idiot when it comes to anything that isn’t Ubuntu, but I’m willing to learn…eventually.
Currently my network hardware manifest includes a Qlogic Silverstorm 24 port DDR switch [9024-FC24-ST1-DDR], four Mellanox Infinihost III DDR cards [MHGS18-XTC], four Mellonox CN passive copper cables [MC1104130-002] and a 1GBE switch.
I’ve been able to run a few simulations so far on this setup, however my speed up is terrible and I shouldn’t wonder why. I’m currently using an OS based switch manager [OpenSM] along with an IP over IB driver rather than a direct networking option (at least I think that’s what’s going on). According to my HPC Cluster Manager diagnostics, my throughput is half what it should be (~1000 MB) even though the switch does indicate that DDR is active. I had to use an older Mellanox driver (2.1.2) for the cards due to them not being supported with newer driver versions, at least that is my understanding.
My objective at the moment is to get the IB setup I have now configured correctly in order to run at it’s peak speed, if that is even possible. At this point I should also note the great deal of difficultly I’m facing getting a copy of software to interface with my switch, such as FabricSuite or QuickSilver OS management. Also, it seems that the switch’s management network connector does not respond to any connections, as indicated to me by the lack of ‘Mgnt’ LED lighting up.
Needless to say I have quite a mess on my hands and would greatly appreciate any help to get this sorted out.
Thanks in advance.