Help for hardware decisions

Dear Community!

at the moment we expand our cluster. We started with ~22 Nodes and a SX6036. Even if the nodes

are connected only via QDR. Now new nodes should be integrated and following the fat tree

topology I plan to buy new switches. But now some questions come up.

  1. For a non blocking configuration with less or equal 36 nodes is it better to stay with one switch, or

are there any technical reasons to have one level 2 switch and two level 1 switches?

  1. I plan to use the SX6036 as level 2 switch and to add a SX6025 as second level 2 switch.

as leaf switches I want to use four IS5025, because the nodes are limited to QDR. For the

future i may want to add FDR capable switches/nodes. Sounds this setup reasonable to you?

  1. Are there any special cables/tricks to connect the switches, or have I to take 18

single cables from leaf switch to level 2 switches?

best regards!


Hi Sven,

See some answers below to your questions:

  1. If you plan on deploying a non-blocking cluster of up to 36 nodes, the SX6036 should be fine. However if you already know you would exceed this number you should start building your L2 switches at the beginning with the proper cabling, this way you would save downtime and re-cabling of the cluster once number of nodes grows beyond 36.

  2. Yes. This setup sounds reasonable.

  3. There aren’t any special tricks on the cabling. On a non blocking fat-tree built with 36 ports switches - 18 going toward aggregation and 18 facing the nodes. Make sure that L1 to L2 links are spread as even as possible between the L2 switches.

Another option for you to consider is a chassis switch (starting with 108 ports with SX6506).

There are some trade-offs between a design with 1U switches versus a chassis design - it is usually in favor of the chassis in a large scale design and toward the 36 ports switches in lower scale clusters.

With a chassis you need to populate all spines, and populate leafs as needed, you would need less cables but probably longer ones.

With 36 ports switches you would build your L2 aggregation at the beginning (to avoid cluster downtime and re-cabling) and add L1 switches as needed. You would need more cables but shorter ones.