RDMA read failing with Remote Invalid Request Error

vmansur · October 9, 2017, 9:18am

I am attempting RDMA read between Mellanox ConnectX-4 adapter’s and the same is failing with the following error CQE:

[ 3134.946117] mlx5_0:dump_cqe:262:(pid 5984): dump error cqe

[ 3134.946120] 00000000 00000000 00000000 00000000

[ 3134.946122] 00000000 00000000 00000000 00000000

[ 3134.946124] 00000000 00000000 00000000 00000000

[ 3134.946127] 00000000 00008a12 100000ac 000008d2

opcode=0xd, syndrome=0x12, vendor syndrome=0x8a

Per Mellanox PRM opcode 0xd (13) is requester error, syndrome 0x12 is Remote_Invalid_Request_Error

http://www.mellanox.com/related-docs/user_manuals/Ethernet_Adapters_Programming_Manual.pdf http://www.mellanox.com/related-docs/user_manuals/Ethernet_Adapters_Programming_Manual.pdf

I haven’t been able to figure out much on the Remote_Invalid_Request_Error error, but one link on the web (rdmamojo) pointed out that this could be due to qp_access_flags in remote QP wasn’t configured to support this operation), insufficient buffering to receive a new RDMA or Atomic Operation request, or the length specified in a RDMA request.

I have validated all of the above (qp_access flags on the responder has RDMA read enabled, enough buffer to receive RDMA read and length specified for RDMA read request is also fine on the requester). In addition I have also validated the Remote Addr/Rkey, Local Addr/Lkey and length and the entire WQE posted, they all looks fine.

Any idea what else could cause this error (Remote_Invalid_Request_Error) ? Also I could find details on vendor syndrome of 0x8a, is there a way to decode this error for further details on the failure ?

Thanks for your help !

rdarbha · October 30, 2017, 6:21am

I’m having a similar problem with the RHEL Inbox drivers and NFS over RDMA between some newly installed machines, getting Local Length Errors. While I don’t have a solution for you specifically, I’m wondering, how did you do the validation you mention (qp_access flags, buffer sizes, etc.)? I’m wondering if any of the things you’re mentioning might help me with my problem.

vmansur · October 30, 2017, 8:51am

Basically by validating (qp_access_flags. buffer sizes etc.) I mean, I made sure they have the right values. For example qp_access flags was enabled for RDMA read and write,

buffer size in the work request matched with that used for memory registration and so on

vmansur · October 30, 2017, 9:08am

I was able to get this issue resolved. The problem was with the “max_dest_rd_atomic” QP attribute. Per documentation, “max_dest_rd_atomic” is “number of RDMA Reads outstanding at any time for this QP as a destination”. Our code was using RDMACM for connection management. The way “max_dest_rd_atomic” is set by RDMACM is via attribute called “responder_resources” sent as an argument “rdma_conn_param” to “rdma_connect”. The argument did not look obvious and hence was not set causing RDMACM to set “max_dest_rd_atomic” to zero. causing RDMA reads initiated to this node to fail.

Basically the syndrome “Remote_Invalid_Request_Error” means lot of issues that are not clearly defined, hence it took us time to figure out the exact issue. This is where I was hoping that “vendor syndrome” might come in handy to figure out root cause for “Remote_Invalid_Request_Error” or similar errors that have multiple failure reasons. Unfortunately “vendor syndrome” doesn’t seem to be exported by Mellanox. It will help if Mellanox could export this error with its corresponding description such that it will help Mellanox RDMA users to debug similar issues.

Thanks !

hrbear · November 3, 2020, 9:14am

Thanks ! I meet the same problem, and i validated all the points i could find during the past two days until find the paper u record:)

Topic		Replies	Views
MLX Completion with error Mellanox OFED infiniband	2	1747	June 20, 2023
ConnectX-2 CQE remote access error (Protection Error on remote data buffer) vendor syndrome 0x88 InfiniBand/VPI Adapter Cards define	0	658	May 13, 2016
I have problem porting my RDMA application from InfiniBand(Mellanox Connectx-3 40Gb IB) to RoCE(Connectx-4 100GbE). Mellanox OFED	2	708	April 3, 2016
Can't use RDMA on Azure NC24rs_v3 Software And Drivers iterations , bytes	2	1078	January 10, 2021
Getting lot of "CQE error - vendor syndrome: 0x69 syndrome: 0x1" errors, transmission interrupted. Adapters and Cables	2	1383	November 13, 2019
RDMA not working with ConnectX-6 Software And Drivers iterations , bytes	2	8858	January 29, 2022
Setting up Mellanox NVMf offload Ethernet Adapter Cards	3	1230	October 10, 2019
CQE completed in error - vendor syndrom:216 syndrom:2 Mellanox OFED	1	566	September 20, 2016
RDMA Read failure on page with scatterlist debugging Mellanox OFED	1	20	March 12, 2025
Mlx5dv_devx_obj_create fails Mellanox OFED	2	990	July 6, 2023

RDMA read failing with Remote Invalid Request Error

Related topics