We have setup here, with x86 host has connectX3 Pro 314A card, and communicating with our target board that has 10G ethernet interface.
On the target side, we have RoCEv2 protocol stack over which NVMOF application is running.
On host side we are executing FIO read, for this host generates NVMe SSD read commands to the target. In turn, target generates RDMA Write requests and following that sends completions.
We have IBdump that is attached. The issue we are encountering is for RDMA Write request, host is sending Remote Access Error as ACKNOWLEDGEMENT. As per Infiniband Specification (section 184.108.40.206.4 REMOTE ACCESS ERROR), reasons for sending this response (with Remote Access Error) are following,
The R_Key field of the RETH is invalid.
The virtual address and length or type of access specified is outside the locally defined limits associated with the R_Key.
For an HCA, a protection domain violation is detected.
Analyzing the IB dump, we did not find that any of above violations. From our analysis,
QP#243 on host paired with QP#3 on target. For RDMA write request with PSN 9529773 from target, received response with Remote Access Error. The R_KEY is 75014, Virtual Address is 8150351872, and
DMA Length is 4096 bytes. The corresponding received NVME command capsule from host is with PSN 9529418. The RETH parameters (R_KEY, Virtual Address and DMA Length) are exactly matching as
specified in NVMe command.
We have also verified that there are no repeated R_KEY and Virtual Address pairs. We wanted to understand the reasons why host is issuing Remote Access Error. Please help us.
sniffer_1qp.pcap.zip (780 KB)