Issues with Assigning 64 PKeys for Each Server in Cluster

I am seeking help with an issue I am experiencing while working on a cluster using ConnectX-5 devices. I want to assign 64 PKeys for each server, but I’m running into some problems. Here’s what I have done so far:

  • I created a partitions.conf file, with each line formatted like =0x80**,ipoib:ALL=full
  • After applying the partitions.conf, OpenSM complains that the switch has only 8 PKey capacity.
  • I then created 64 IPoIB interfaces corresponding to the 64 PKeys on two servers.
  • The first 32 PKeys work well, but the remaining 32 do not. “Ping” shows network unreachable, and OpenSM complains with log_trap_info: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:3 TID:xxx
  • I modified the partitions.conf file, changing each line to =0x80**,ipoib:ALL_CAS=full
  • OpenSM no longer complains about the 8 PKey capacity limit.
  • However, the network connectivity issue persists. The first 32 PKeys still work well, but the rest do not. “Ping” shows network unreachable, and OpenSM still complains with the same error message as before.

I would appreciate any guidance or suggestions to resolve this issue and successfully assign 64 PKeys for each server in the cluster. Thank you in advance for your assistance!

Hi Gesrua,
You can use #smpquery pkeytables .
Then you can find switch port only support 32 pkeytables.
Thanks,

My switch is QM8790. I failed to find the limitation of 32 pkeys on any documents.

I tried to use #smpquery, but the result is not as expected.

# smpquery nd -G 0x****
Node Description:...Quantum Mellanox Technologies
# smpquery pkeys -G 0x****
   0: 0xffff 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
8 pkeys capacity for this port

There are only 8 pkeys capacity. However, in my case, 32 pkeys also seem to work.

Hi,
you need use #smpquery pkeytables
Not only with switch lid.

Thanks,

Sorry, I’m still confusing.

I tried to run smpquery pkeytables / smpquery pkeytable. It didn’t work.

$ sudo smpquery pkeytables

Usage: smpquery [options] <op> <dest dr_path|lid|guid> [op params]

Supported ops (and aliases, case insensitive):
  NodeInfo (NI) <addr>
  NodeDesc (ND) <addr>
  PortInfo (PI) <addr> [<portnum>]
  PortInfoExtended (PIE) <addr> [<portnum>]
  SwitchInfo (SI) <addr>
  PKeyTable (PKeys) <addr> [<portnum>]
  SL2VLTable (SL2VL) <addr> [<portnum>]
  VLArbitration (VLArb) <addr> [<portnum>]
  GUIDInfo (GI) <addr>
  MlnxExtPortInfo (MEPI) <addr> [<portnum>]


Options:
  --combined, -c          use Combined route address argument
  --node-name-map <file>  node name map file
  --extended, -x          use extended speeds
  --config, -z <config>   use config file, default: /etc/infiniband-diags/ibdiag.conf
  --Ca, -C <ca>           Ca name to use
  --Port, -P <port>       Ca port number to use
  --Direct, -D            use Direct address argument
  --Lid, -L               use LID address argument
  --Guid, -G              use GUID address argument
  --timeout, -t <ms>      timeout in ms
  --sm_port, -s <lid>     SM port lid
  --show_keys, -K         display security keys in output
  --m_key, -y <key>       M_Key to use in request
  --errors, -e            show send and receive errors
  --verbose, -v           increase verbosity level
  --debug, -d             raise debug level
  --help, -h              help message
  --version, -V           show version

Examples:
  smpquery portinfo 3 1                # portinfo by lid, with port modifier
  smpquery -G switchinfo 0x2C9000100D051 1    # switchinfo by guid
  smpquery -D nodeinfo 0                # nodeinfo by direct route
  smpquery -c nodeinfo 6 0,12            # nodeinfo by combined route

I think it is because 1. pkeytables should be pkeytable, 2. there must be a <dest dr_path|lid|guid>.

I cannot get your point. Do you mean the smpquery command needs to run on the switch? However, QM8790 is a externally managed switch. It seems that I cannot login to the switch and execute commands.

Please see the result in my lab:
Anyhow, switch port can support 32 pkeys.
[root@yl pkeys]# smpquery pkeytable 40 1
0: 0xffff 0x0000 0x0000 0X0000 0x0000 0x0000 0x0000 0x0000
8: 0x000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
16: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
24: 0x0000 0x0090 0x0009 0x0000 0x0000 0x0000 0x0000 0x0000
32 pkeys capacity for this port

This is because you are querying without a port indication – causing you to query the internal SMA port (aka port0) which holds 8 pkeys.

Add a port indication and you’ll see the 32 pkeys…

Example for the comment below:

No port indication

smpquery pkeys -G 0x####

0: 0xffff 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

8 pkeys capacity for this port

Incl. port indication (port #1 in this case)

smpquery pkeys -G 0x#### 1

0: 0xffff 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

16: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

24: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

32 pkeys capacity for this port

Thanks! I query with a port indication and get the correct result!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.