Bad input length(0x50) when creating DCI QP

Hello,

To start my development, I’m trying to create a minimal example of Dynamically Connected Transport QPs. For that, one of the examples that I’m using is from NVIDIA itself: Dynamically Connected (DC) QPs - NVIDIA Docs

The above example works on the DC Target side, but I’m facing an error on the Initiator side when creating the QP using the mlx5dv_create_qp function.

The function fails with errno 5 (i/o error) on my Connect-X5 hardware and I’m seeing these lines on dmesg:

[4494044.006304] mlx5_core 0000:af:00.0: mlx5_cmd_out_err:797:(pid 619708): CREATE_QP(0x500) op_mod(0x0) failed, status bad input length(0x50), syndrome (0x2f50ca), err(-5)
[4494044.007456] infiniband mlx5_0: create_qp:3192:(pid 619708): Create QP type 4098 failed

There is no further explanation on the logs that might suggest the cause error. Am I missing something on the initiator side that is not being covered by the example? Thank you.

I have found there is little public documentation on syndromes/errors.

However there is one page [1] that has useful information. The syndrome you have is documented there as:

BAD_INPUT_LEN       | 0x2F50CA |  create qp: not enough pas supplied to support buffer size 

I’m not sure if this helps your situation but its the only info i could find

[1] Mellanox error syndrome lists · GitHub

1 Like

That’s certainly a step in the right direction, but I could only find PAS in the kernel driver code, so I wonder how this correlated with the user-land code.

Hi @caianbene,

It seems that the driver sent to the firmware buffer length parameter insufficient to establish the QP context.
This situation can arise if there is a compatibility issue between the driver and firmware versions. To verify this, please check the Release Notes of the driver, available at: Linux InfiniBand Drivers.
Additionally, improper parameter handling can also lead to this problem.

Regards,
Chen

Thanks @chenh1, we had indeed a firmware issue on some of our nodes. We repeated the test on two Dell PowerEdge systems that are properly configured (driver 5.6-2.0.9 and firmware 16.32.1010) and now dmesg only shows:

[4648076.498624] infiniband mlx5_0: create_qp:3192:(pid 4166615): Create QP type 4098 failed

But there is still a problem when creating the QP for the DCT. You mentioned improper parameter handling. Does this mean device/driver configuration or user code? Because the code was taken from NVIDIA documentation.

Thank you,