Advice on partitioning an IB network


I would appreciate some advice in partitioning an IB network, please.

We have quite a large IB network – there are almost 800 hosts on the network. All these hosts are part of a single computation cluster. Recently, a researcher in the university bought a new system consisting of 4 hosts. This new system is completely separate for the cluster in that it does share the same private Ethernet. On the other hand to made the new system affordable we decided to allow the owner to take 4 of our spare/free IB ports.

Currently, both the new and old system share the same OpenSM partition. That is…ibhosts gives both the new/old hosts:

New hosts…

Ca : 0xe41d2d0300e16190 ports 2 “srv01935 mlx4_0”

Ca : 0x248a070300f052f0 ports 2 “srv01934 mlx4_0”

Ca : 0xe41d2d0300e166d0 ports 2 “srv01933 mlx4_0”

Ca : 0xe41d2d0300e16350 ports 2 “srv01932 mlx4_0”

Old hosts…

Ca : 0xf452140300225f20 ports 1 “orange02 HCA-1”

Ca : 0xf452140300225ec0 ports 1 “orange03 HCA-1”

etc, etc…

I’m wondering if it is best to place the old/new hosts in separate partitions. Does that make sense? If it does make sense then how do I best construct the partitions.conf file to do this? That is, placing the new (srv…) hosts in a partition is easy, but how do I ensure that the default partition is just the old hosts?

Best regards,


Hi David,

It does make sense to partition your subnet but it depends on how much disruption you’re willing to tolerate and whether your apps are partition aware. What are your apps ?

Assuming you are running without partitions.conf file now, all hosts are full members of default partition. In order to separate out the 4 new hosts on it’s own partition, all the existing hosts will also to be placed on their own partition so there is no communication possible between those groups of hosts due to the default partition requirement for SA communication.

Also, where does SM run ? Does it run on one of the existing hosts ? Is it a dedicated node ? Or does it run somewhere else (embedded in a switch) ?

– Hal