Mellanox driver reports "infiniband mlx5_0: create_qp:2947:(pid 101966): Create QP type 2 failed" message at higher client scale

In our project we are launching numerous clients and servers over verbs;ofi_rxm using OFI/libfabrics.

As we exceed certain number of clients, our attempts to connect to servers start failing with server dmesg containing errors of type:

[507263.354558] infiniband mlx5_0: create_qp:2947:(pid 101966): Create QP type 2 failed

Is there a way to determine what type of resource is mlx5_0 running out of? Are there any settings we could teak or additional debug info we could retrieve to figure out the reason for this problem?

OFI version: v1.12.0

Provider used: verbs;ofi_rxm

MOFED version: 5.1.2

System: Frontera@TACC

Hi,

See ibv_create_qp documentation and in which case it fails.

If you are out of resources, means that you reach max_qp (ibv_devinfo -v). you can try to implement a mechanism to count the number of QPS you are creating and to don’t create a new one if you already reach the max.

Marc

Thanks for the response.

Is there a way to increase max_qp size? If not, what is this value based on?