Setup and Test Infiniband Partitions

I’m new to InfiniBand. I’m trying to setup partitions for HPC and the documentation is somewhat lacking.
I’m using connectx5 drivers and the latest opensm.
What I would like to do is have 3 partitions ipoib.
Default = provisioning network
P_1 = compute nodes 1-6
P_2 = compute nodes 7-12

I want the provisioning network to be able to talk to everything, but I only want P_1 to be able to talk to P_1 and P_2 only be able to talk to P_2. I have read I can do this using the partitions.conf and using “limited” but I can’t seem to get the syntax to work, and I can’t find any examples online.

Once the partition is created, how do I assign adapters to it?

Here’s my partitions.conf:

Default=0x7fff : ALL, SELF=full ;
Default=0x7fff : ALL, ALL_SWITCHES=full, SELF=full ;

P_1=0xa0,ipoib, rate=16, mtu=5, defmember=limited : 0x8541c9ffff81406d, 0x8540c9ffff8193b1, ALL_SWITCHES=full;

P_2=0xb0,ipoib, rate=16, mtu=5, defmember=limited : 0x8540c8ffff92a0e5, 0x8540c9ffff914085, ALL_SWITCHES=full;

You can check Nvidia docs or github.
e.g.
https://docs.nvidia.com/networking/display/mlnxofedv24010331/nvidia+sm#src-2571322259_NVIDIASM-Partitions
or
https://github.com/linux-rdma/opensm/blob/master/doc/partition-config.txt

Examples show the various ways for adding a guid or a group of devices into a partition

Yes, I have read those docs over and over. The documentation is lacking and doesn’t make sense to me. That’s why I asked here.

You are asking how to add devices into a partition.
Docs clearly state how to do it.

what are you missing?

I don’t fully understand what defmember does. It says “Specifies default membership for port GUID list.” but when I use that in the above example, those GUIDS are not assigned to the partition that I can find.

What does full or limited membership mean? In order to assign servers to partitions, I assume you need to use the mlnx card’s GUID in this partition.conf. Does it need to be a full member or a limited? How do I know nodes on partition 1 can’t talk to nodes on partition 2?

Defmember is just the default membership assignment on the partition.

  • Without explicitly writing it – the default value for defmember is limited.
  • How are you checking the assignment of those GUIDs in the partition?

in the examples you post:

Default=0x7fff : ALL, SELF=full ;

ALL guids are members, with SM (aka SELF) running as the only full member (since no defmember is mentioned – the default is ‘limited’ membership)

Default=0x7fff : ALL, ALL_SWITCHES=full, SELF=full ;

All guids are members of the default partition. All switches and SM node are full members while others are defaulting to the defmember default value (limited)

P_1=0xa0,ipoib, rate=16, mtu=5, defmember=limited : 0x8541c9ffff81406d, 0x8540c9ffff8193b1, ALL_SWITCHES=full;

IPoIB enabled partition (not sure what rate 16 means – where did you get it from?).

Defmember is limited (you can remove the explicit setting and result will be the same – defmember defaults to limited).

0x8541c9ffff81406d, 0x8540c9ffff8193b1are members, and all switches are members with full membership.

P_2=0xb0,ipoib, rate=16, mtu=5, defmember=limited : 0x8540c8ffff92a0e5, 0x8540c9ffff914085, ALL_SWITCHES=full;

IPoIB enabled partition (not sure what rate 16 means – where did you get it from?).

Defmember is limited (you can remove the explicit setting and result will be the same – defmember defaults to limited).

0x8540c8ffff92a0e5, 0x8540c9ffff914085 are members, and all switches are members with full membership.

Limited members cannot accept information from other Limited members, but communication is allowed between every other combination of membership types.

Simple tests can show whether the partitions are working as expected… e.g. ping between the IPoIB interfaces belonging to the different partitions isn’t expected to work.

Thanks! This is super helpful. I think I missed the part of having to create the child interfaces. I’ll give that a shot.

As for the rate, I got that from the default partitions.conf for 100Gbs EDR:
16: 100 Gb/sec. (EDR 4x)