Using ConnectX-2 VPI adapters for network workstation with 2 nodes.

I am in the process of building a small rack mount setup for CG animation/rendering using Supermicro barebone servers for LGA 2011 Intel Xeon E5 chips. It it will consist of one workstation and two render nodes. I decided to go with infiniband because it was fast and relatively inexpensive to deploy as I was able purchase new a dual port Mellanox MHQH29C-XTR and two single port MHQH19B-XTR cards for $600 on Ebay. My plan is to use the dual port adapter in the workstation and the single port adapters in each of the node with each of them directly plugged into a port on the adapter in the workstation. I have used the same kind of setup for years with regular ethernet NICs without any problems but I just want to make sure that it will work as easily with Infiniband? Am I loosing any benefit with Infiniband when I don’t use a switch? I would appreciate any input as I am very new to the world of Infiniband. Thankyou!

Yes, you can plug a cable between two ports directly but you need to start the Subnet Manager on one machine (/etc/init.d/opensmd, it can be on any machine)

What you lose is scalability, of course. But you get slightly better latency without a switch:)

I was already looking into Puppet Enterprise because it was free for up to 10 nodes.Never heard of Chef but I will give it a look. Thanks again!

EDIT: StH forums? I don’t remember ever posting anything in that forum.

I have decided to make some huge changes to my setup. I’ve found a very good deal on a bunch of Intel Xeon X5680 OEM CPUs and IBM Mellanox ConnectX-2 VPI single port cards on ebay so I decided to make my render nodes on them instead of the LGA 2011 chips. The cost savings are good enough that I’ve decided to up my number of render nodes to 10 since my V-Ray license allows me to render across that many computers. I will be still keeping my rackmount workstation a LGA 2011 Xeon setup. Since my setup will now be 1 workstation and 10 nodes I will have to get a switch so I was looking at the IS5023. Should be a very fun and interesting undertaking!

Thank you very much for the info Justin, much appreciated!

Hi animGuy, it appears we have some things in common; Renderfarmer's DIY Render Farm | ServeTheHome Forums Renderfarmer's DIY Render Farm | ServeTheHome Forums

I’m on Vray for Maya and I started out with two DP X56XX nodes 2 years ago.

A few things I wish someone had pointed out to me when I started out:

  1. Once you get 10, you’ll want 20; plan accordingly.
  2. You’ll want a dedicated file server/license manager/render manager for your setup.
  3. X5680’s run very hot. Ensure adequate ambient cooling and mind your electric bill.
  4. Make sure you have adequate power adjacent to your rack for 10+ of those machines.
  5. IPOIB is awesome but it’s hard to bridge with a home internet connection.
  6. IB switches are LOUD.

Shoot me a PM if you have any questions or have some advice of your own to share.

siszlai Infrastructure & Networking - NVIDIA Developer Forums - Generally that’s good advice (ie “you need to run opensm”), but for the specific topology mentioned here it’s not quite right.

In a 3 node setup, where two nodes have only a single port, and the middle box has two ports doesn’t work quite like that.

OpenSM only binds to one port on a server (by default the first one on the first card), then explores/discovers network topology through just that connection. So, if it gets started on the middle box, it won’t see the 2nd port nor the other server connected through it. (this is specific to only this topology, and doesn’t happen with a switch)

The workaround is super easy, but counter-intuitive. Just run OpenSM on both of the nodes with the single port cards, and don’t run it on the middle box.

I do similar to this with some test boxes here, and run IPoIB over the top of it. Works fine that way.

(note - edited slightly for clarity)

That sounds cool, and it’s no stress at all.

Sounds like interesting fun.

Btw, saw some of your posts on StH forums.

Have you considered using centralised management tools for your nodes (something like Puppet or Chef), instead of manually hand configuring everything via ssh?

That kind of thing is what Puppet and Chef are designed to handle (both competing projects in the same space). For your setup, you don’t need the “Enterprise” version of them, just the normal Open Source Software project release would be fine.

Just saying.

Thanks for the warning. Judging from the posts the IS5023 doesn’t have that problem.

WOW, that is nice home setup. 10 is going to be good for me for a long while. I don’t even want to think about much it will cost to power a setup like this for a few days. Aside from my license of VRay for Maya the only other render I have that will let me render across that many nodes is Luxology Modo 701(which just until recently supported Linux). Will probably purchase a workstation license of Sidefx Houdini FX after I get this up and running since that also comes with unlimited render nodes and awesome VFX tools .

I would have gone with E5-2670 chips for my render nodes but I found X5680 chips for over half the price ($740 each to be exactly) of even used E5 chips and the performance difference between the two chips did not warrant double the price for the Sandy Bridge chips. For the nodes I am considering using Supermicro 1U Twin Servers to save on space. They have a model that has a Mellanox ConnectX-2 40Gbps NIC built into the motherboard of both nodes which still leaves the PCI-E 2.0x16 slot open for a LSI RAID card that can connect to the backplane of each node. Supermicro | Products | SuperServers | 1U | 1026TT-IBQF Supermicro | Products | SuperServers | 1U | 1026TT-IBQF

I was going to go with bare-bone 3U Supermicro GPU Server to act as my workstation/dedicated file server/license manager/render manager. Supermicro | Products | SuperServers | 3U | 6037R-72RFT+ Supermicro | Products | SuperServers | 3U | 6037R-72RFT+

What I like most about this server is that it supports up to two high end Kepler GPUs, can have its own Battery Backup (though it doesn’t seem as adequate as dedicated UPS), has an on-board LSI 2208 HW RAID that can support up to 16 devices, and 2 10GBase-T NICs. The two 5.25" drive bays could easily be used to add 2.5" HD enclosures.

As far as a rack mounted UPSs and PSUs what would you recommend for a setup like this. I have worked with with rack mounted SATCOM and computer networking equipment for almost 10 yrs now (currently employed as a SATCOM SNAP Support Engineer in Afghanistan) but I have never built my own from the setup so any input would be greatly appreciated.

Thankyou very much renderfarmer for the pointers. I will be sure to PM questions in the near feature!

Out of curiosity, which operating system(s) are you wanting to use for the servers and workstation?

Asking because it can make a difference, depending on what you want to do.

For example, if you’re a Linux guy, the Infiniband drivers are better supported (more mature) in RHEL/CentOS than Ubuntu. If you’re a Windows guy, the protocols available are different in Win 2K8 vs Win 2K12.

So, better to know ahead of time, etc.

Thanks.

Those X5680s will do just fine. The X56XX chips are only 10% slower per clock in Vray than the E5. That’s based on some extensive comparative testing that I did personally.

A couple of things stand out about your proposed config:

  1. You’re render nodes don’t need RAID controllers.
  2. The 10Base-T NICs in your 3U workstation will be a serious bottleneck for your 40Gbps Render nodes.

I’m very happy with my 3000VA CyberPower PSU. It was quite reasonably priced. The part number is in that StH link I posted for you.

So far you’re looking at well over $30k for the render nodes and networking gear alone - taking the CPU price and IS5023 you quoted into account. Take great care in planning before you proceed.

I had already decided on going with CentOS Linux for this setup but I am glad to hear that it has very mature Infiniband drivers. All my CG and Compositing programs support and work best with RHEL/CentOS

.

It’s pretty straight forward to get working.

Software wise, CentOS comes with Infiniband drivers and utils that you install via yum:

$ sudo yum groupinstall “Infiniband Support”

When getting cards from eBay, the firmware on them is often a bit old. Easily fixable with the Mellanox firmware burning tool (flint). Install that via yum:

$ sudo yum install mstflint

For your 3 node setup (no switch), you should install the Open Subnet Manager (OpenSM) package on both of your single port nodes (your render boxes):

$ sudo yum install opensm

$ sudo chkconfig opensm on

$ sudo service opensm start

Updating the firmware on your cards, using flint, is also pretty straightforward but is better covered another day.

Note, if you ever want to remove the Infiniband stuff from your CentOS boxes, it’s pretty easy:

$ sudo yum groupremove “Infiniband Support”

$ sudo yum remove mstflint opensm

Hope that’s all helpful.

(note - edited for typo fixes)

Is all this stuff going into a proper data center, or are you going to have it at home / office ?

Asking because the noise level for some of that stuff might be at the “can’t think straight” level if anyone’s near it.

Heh, went looking back at the StH forums:

http://forums.servethehome.com/networking/ http://forums.servethehome.com/networking/

… and you’re right. I was thinking of another guy (renderfarmer Infrastructure & Networking - NVIDIA Developer Forums ) who seems to be in the same field.

Sorry.

renderfarmer wrote:

6. IB switches are LOUD.

Found that out too the first time around, so have been looking for quiet ones instead. This is very helpful:

Suggestions for quiet Infiniband switch? Infrastructure & Networking - NVIDIA Developer Forums

It is going into a spare room I have in the house near the garage. I was actually planning on purchasing a noise dampening enclosure.

  1. So SATA2 won’t be a bottleneck, I thought that it would be. I will remember that.

  2. About the 10Base-T NIC; I was only going to use it to network to my 2 HP Workstations running windows via Samba. Its not go to be utilized for rendering purposes.

Right now I am just ordering the CPUs for the time being. I won’t be back home from Afghanistan until I take leave at the end of July. Around that time is when I will be looking for looking for rest of the parts to build my setup.

Thanks again!

I got a question about the SX6018 switch and any input would be greatly appreciated. Will the FDR ports auto-negotiate down QDR and work with ConnectX-2 adapters? Thanks!