OpenSM partitions and pkey concept

Hello everyone,I’m new to infiniband.Here is an openSM subnet include 2 client and 1 server,2 client should be isolated but both of them can access server.

I’ve try to assign different pkey to 2 hosts(0x1 and 0x2), index0,IPoIB,but both of host can‘t ping server.

client1/2 has only 1HCA,server1 has 2 HCAs.

cat /etc/opensm/partitions.conf

client1=0x1, indx0, ipoib, defmember=limited :0xe8ebd30300222488;

client2=0x2, indx0, ipoib, defmember=limited :0xe8ebd303002241d8;

server1=0x1, indx0, ipoib, defmember=full :0x58a2e103007018e4;
server1=0x1, indx0, ipoib, defmember=full :0x58a2e103007020dc;
server1=0x2, indx0, ipoib, defmember=full :0x58a2e103007018e4;
server1=0x2, indx0, ipoib, defmember=full :0x58a2e103007020dc;

Is the partition defination right?How can I verify?

Hi,

Understanding:

1.Partitions have 2 types of membership. Full membership and limited membership

  • However, they can communicate between every other combination of membership types Full (default): Members with full membership may communicate with all hosts (members) within the network/partition.
  • The membership type is added as the most-significant Limited/partial: Members with limited membership cannot communicate with other members with limited membership. However, they can communicate between every other combination of membership types (e.g., full + limited).
  1. The membership type is added as the most-significant bit to the pKey number. For example, the default pKey being 7fff will either have a value of 0x7FFF (limited) or 0xFFFF (full).

=============Example=============

#PKEY LIMITED 7fff
$hex_to_bin 7FFF
0111111111111111

#PKEY FULL 7fff
$hex_to_bin FFFF
1111111111111111

#Pkey LIMITED 6FFF

$hex_to_bin 6FFF
0110111111111111

#PKEY FULL 6FFF
$hex_to_bin EFFF
1110111111111111

In the examples above, if the most-significant bit (the most left bit) is on, then it is considered full membership.

=================================

default pkey 0x7fff
FULL = 0xffff
Limited = 0x7fff

Example PKEY2 0x6FFF
FULL = 0xEFFF
Limited = 0x6FFF

Supporting Example:
Subnet Manager has a pkey of 0xffff (Full 7ffff)

Nodes 1,2,3 has a pkey of 0x7fff (limited 7fff)

SM can talk to Nodes 1,2,3 , Nodes 1,2,3 can only talk to SM

Bundle the nodes that you want to be able to talk to together in the same PKEY and set to full membership.

========================================

Note that the default pkey is stored at index0 in partitions.conf. The partitions.conf file is located

  • In the switch, the file is located at: vtmp/infiniband-default/var/opensm/partitions.conf
  • In the Linux host, the file is located at: /etc/opensm/partitions.conf
    - From the logs perspective, we can check ibdiagnet2.pkey for details.

Note: Defmember is just the default membership assignment on the partition. Without explicitly writing it – the default value for defmember is limited. Also understand that the partitions.conf file is parsed upon heavy sweep.

We can use the following command on the SM server to force a heavy sweep (not required with UFM which automatically parses the new partition configuration):

#pkill -HUP opensm 

KEY NOTE: Nothing can communicate across 2 distinct pkey. members of 0x6FFF cannot communicate with members of 0x7FFF irrespective of membership levels.

KEY NOTE: Limited members cannot communicate with each other even if they are in the same partition ID. For example, 2 or more GUIDS with limited membership to 0x6FFF cannot communicate with each other

KEY NOTE: Limited members can only communicate with full members of the same partition ID. For example, limited members of partition ID 0x6FFF can communicate with full members of 0x6FFF

Configuration

You may edit the partitions.conf file directly, through the CLI , or through the webGUI

Create the PKEY

#ib partition <partition name> pkey <pkey number> 

Add the GUID to the partition where ‘Default’ is the name of the partition and the ‘GUID’ is the HCA you are adding with full membership. You can run ibnetdiscover from SM to retrieve the GUID.

Note that when configuring the port GUID on the switch, you must add the leading zeros as they do not show in the ibnetdiscover output.

#ib partition Default member 00:98:03:9b:03:00:8d:5d:1c type full  

Reference: Detail step by step guide on how to configure

https://docs.nvidia.com/networking/display/winof2v310lts/infiniband+network

Examples

=================================

Sample A partitions.conf

Default=0x7fff : ALL, SELF=full ;
Default=0x7fff : ALL, ALL_SWITCHES=full, SELF=full ;
P_1=0xa0,ipoib, rate=16, mtu=5, defmember=limited : 0x8541c9ffff81406d, 0x8540c9ffff8193b1, ALL_SWITCHES=full;
P_2=0xb0,ipoib, rate=16, mtu=5, defmember=limited : 0x8540c8ffff92a0e5, 0x8540c9ffff914085, ALL_SWITCHES=full;

From the information above, we can see the following

Default=0x7fff : ALL, SELF=full ;

ALL guids are members, with SM (aka SELF) running as the only full member (since no defmember is mentioned – the default is ‘limited’ membership)

Default=0x7fff : ALL, ALL_SWITCHES=full, SELF=full ;

All guids are members of the default partition. All switches and SM node are full members while others are defaulting to the defmember default value (limited)

P_1=0xa0,ipoib, rate=16, mtu=5, defmember=limited : 0x8541c9ffff81406d, 0x8540c9ffff8193b1, ALL_SWITCHES=full;

IPoIB enabled partition, rate = 16 would be 100 Gbit/s (EDR 4x), MTU is set to 4096, Defmember is limited. 0x8541c9ffff81406d, 0x8540c9ffff8193b1are members, and all switches are members with full membership.

P_2=0xb0,ipoib, rate=16, mtu=5, defmember=limited : 0x8540c8ffff92a0e5, 0x8540c9ffff914085, ALL_SWITCHES=full;

IPoIB enabled partition, 100 Gbit/s (EDR 4x) , MTU is set to 4096, Defmember is limited.0x8540c8ffff92a0e5, 0x8540c9ffff914085 are members, and all switches are members with full membership.

=================================

Sample B Use Case

I have 5 nodes A,B,C,D,SM. I want Nodes A and B to be able to talk to each other only and C and D to talk to each other only. How can I accomplish this use case?

All members except SM are part of default PKEY with limited membership. SM is part of default PKEY with full membership.

Nodes A and B = full membership PKEY1 to talk to each other over that PKEY1

Nodes C and D = full membership PKEY2 to talk to each other over that PKEY2


PKEY0 (default pkey) = 5 nodes (limited for all nodes, except SM)

PKEY1 = 2 Nodes with full membership

PKEY2 = 2 Nodes with full membership

=================================

Reference MTU and Rate Chart

mtu =

1 = 256

2 = 512

3 = 1024

4 = 2048

5 = 4096

rate =

2 = 2.5 GBit/s

3 = 10 GBit/s

4 = 30 GBit/s

5 = 5 GBit/s

6 = 20 GBit/s

7 = 40 GBit/s

8 = 60 GBit/s

9 = 80 GBit/s

10 = 120 GBit/s

rate =

2 = 2.5 GBit/s (SDR 1x)

3 = 10 GBit/s (SDR 4x/QDR 1x)

4 = 30 GBit/s (SDR 12x)

5 = 5 GBit/s (DDR 1x)

6 = 20 GBit/s (DDR 4x)

7 = 40 GBit/s (QDR 4x)

8 = 60 GBit/s (DDR 12x)

9 = 80 GBit/s (QDR 8x)

10 = 120 GBit/s (QDR 12x)

If ExtendedLinkSpeeds are supported, then these rate values are valid too

11 = 14 GBit/s (FDR 1x)

12 = 56 GBit/s (FDR 4x)

13 = 112 GBit/s (FDR 8x)

14 = 168 GBit/s (FDR 12x)

15 = 25 GBit/s (EDR 1x)

16 = 100 GBit/s (EDR 4x)

17 = 200 GBit/s (EDR 8x)

18 = 300 GBit/s (EDR 12x)

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.