Mlnx-ofed 5.4: ibv_create_qp cannot malloc memory more than 4026 clients on one sigle node


I’m a newer to mlnx-ofed.
I’m doing a test on a single node with 4050 clients and a server, find that only 4026 clients can setup up successful, the rest clients all report a error: ibv_create_cq: Invalid argument(22) or ibv_create_qp: Cannot allocate memory(12)
I am using the libfabric with RXM endpoint(RC type), I wonder why I can’t create queues? The current count of qp or cq are both not up to the limit, and the machine memory is sufficient, only 150g is used, more than 350g is available.

I also ran ib_send_lat after the clients were up, it told me to reduce the qp size by decreasing the tx size or inline size. My tx size is 1, so I adjusted FI_VERBS_INLINE_SIZE to 8, but it doesn’t work, the inline size is still 236 when I ran ib_send_lat again.

I also tried “strace”, find that ioctl(fd, RDMA_VERBS_IOCTL, …) returns Cannot allocate memory(12) while calling ibv_create_qp. What seems to be the limitations in the driver?

Thanks in advance for your precious time!

“ulimit -a”:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 2060178
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) unlimited
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

“free -g” after all my clients were up

                total        used        free      shared  buff/cache   available
Mem:            503          92         409           0           0         408
Swap:             0           0           0

“rdma res” after all my clients were up:

  0: mlx5_0: pd 2 cq 3 qp 1 cm_id 0 mr 0 ctx 4027 srq 2 
  1: mlx5_1: pd 2 cq 3 qp 1 cm_id 0 mr 0 ctx 4027 srq 2 
  2: mlx5_2: pd 8056 cq 4030 qp 8053 cm_id 12079 mr 16159 ctx 4027 srq 2 
  3: mlx5_3: pd 2 cq 3 qp 1 cm_id 0 mr 0 ctx 4027 srq 2

“rdma_client” after all my clients were up:

client: rdma_client -s
rdma_client: start
rdma_create_ep: Cannot allocate memory
rdma_client: end -1

“ibstatus mlx5_2”:

Infiniband device 'mlx5_2' port 1 status:
        default gid:     fe80:0000:0000:0000:bace:f6ff:fe0b:3e94
        base lid:        0x0
        sm lid:          0x0
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            100 Gb/sec (4X EDR)
        link_layer:      Ethernet

“mlxfwmanager -d mlx5_2”:

Querying Mellanox devices firmware ...

Device #1:

  Device Type:      ConnectX5
  Part Number:      7359059_MCX556A-EDAS_C14_OCI_Ax_Bx
  Description:      ConnectX-5 Ex VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16; tall bracket; ROHS R6
  PSID:             ORC0000000003
  PCI Device Name:  mlx5_2
  Base MAC:         b8cef60b3e94
  Versions:         Current        Available     
     FW             16.29.1436     N/A           
     UEFI           14.22.0016     N/A           

  Status:           No matching image found

“ibv_devinfo -d mlx5_2 -v”:

hca_id: mlx5_2
    transport:                      InfiniBand (0)
    fw_ver:                         16.29.1436
    node_guid:                      043f:7203:00e2:f322
    sys_image_guid:                 043f:7203:00e2:f322
    vendor_id:                      0x02c9
    vendor_part_id:                 4121
    hw_ver:                         0x0
    board_id:                       ORC0000000003
    phys_port_cnt:                  1
    max_mr_size:                    0xffffffffffffffff
    page_size_cap:                  0xfffffffffffff000
    max_qp:                           131072
    max_qp_wr:                      32768
    device_cap_flags:               0xed721c36
                                    Unknown flags: 0xC8400000
    max_sge:                        30
    max_sge_rd:                   30
    max_cq:                         16777216
    max_cqe:                        4194303
    max_mr:                         16777216
    max_pd:                         16777216
    max_qp_rd_atom:                 16
    max_ee_rd_atom:                 0
    max_res_rd_atom:                2097152
    max_qp_init_rd_atom:          16
    max_ee_init_rd_atom:           0
    atomic_cap:                     ATOMIC_HCA (1)
    max_ee:                          0
    max_rdd:                        0
    max_mw:                         16777216
    max_raw_ipv6_qp:                0
    max_raw_ethy_qp:                0
    max_mcast_grp:                  2097152
    max_mcast_qp_attach:            240
    max_total_mcast_qp_attach:      503316480
    max_ah:                         2147483647
    max_fmr:                        0
    max_srq:                        8388608
    max_srq_wr:                     32767
    max_srq_sge:                    31
    max_pkeys:                      128
    local_ca_ack_delay:             16
    uc_odp_caps:           NO SUPPORT
    ud_odp_caps:          SUPPORT_SEND
    completion timestamp_mask:                      0x7fffffffffffffff
    hca_core_clock:                 78125kHZ
    raw packet caps:
                                    C-VLAN stripping offload
                                    Scatter FCS offload
                                    IP csum offload
                                    Delay drop
    device_cap_flags_ex:            0x30000055ED721C36
                                    Unknown flags: 0x3000004100000000
            max_tso:                        262144
            supported_qp:                                   SUPPORT_RAW_PACKET
            max_rwq_indirection_tables:                     16777216
            max_rwq_indirection_table_size:                 256
            rx_hash_function:                               0x1
            rx_hash_fields_mask:                            0x800000FF
    max_wq_type_rq:                 8388608
            qp_rate_limit_min:      1kbps
            qp_rate_limit_max:      100000000kbps
    tag matching not supported
    cq moderation caps:
            max_cq_count:   65535
            max_cq_period:  4095 us
    maximum available device memory:        131072Bytes
    num_comp_vectors:               63
            port:   1
                    state:                  PORT_ACTIVE (4)
                    max_mtu:                4096 (5)
                    active_mtu:             4096 (5)
                    sm_lid:                 0
                    port_lid:               0
                    port_lmc:               0x00
                    link_layer:             Ethernet
                    max_msg_sz:             0x40000000
                    port_cap_flags:         0x04010000
                    port_cap_flags2:        0x0000
                    max_vl_num:             invalid value (0)
                    bad_pkey_cntr:          0x0
                    qkey_viol_cntr:         0x0
                    sm_sl:                  0
                    pkey_tbl_len:           1
                    gid_tbl_len:            256
                    subnet_timeout:         0
                    init_type_reply:        0
                    active_width:           4X (2)
                    active_speed:           25.0 Gbps (32)
                    phys_state:             LINK_UP (5)
                    GID[  0]:               fe80:0000:0000:0000:063f:72ff:fee2:f322, RoCE v1
                    GID[  1]:               fe80::63f:72ff:fee2:f322, RoCE v2
                    GID[  2]:               0000:0000:0000:0000:0000:ffff:c0a8:a801, RoCE v1
                    GID[  3]:               ::ffff:, RoCE v2

@ rthaker hi, could you please help looking at this problem? we failed to run on 100 nodes(30 clients and 20 servers each node) earlier, so i did this test to simulate the situation. not sure if this problem could also happen on 100 or 200 nodes, because it’s not clear what the cause of the problem is