Operation IB_WR_REG_MR fails with IB_WC_MW_BIND_ERR

We have kernel module program which uses IB network for data transfer. This works perfectly on Debian and now we are trying to port that module onto FreeBSD and facing issue with memory registration.

 uname -a
FreeBSD test1 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n259630-0ca90ed42a49: Sat Jul 28 09:40:05 IST 2001 root@test1:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

Our program creates pd,mr(using ib_alloc_mr), cqs and qp(with type IB_QPT_RC). Then moves qp to states IB_QPS_INIT->IB_QPS_RTR->IB_QPS_RTS.
Gets dma_address for the malloced address by call ib_dma_map_single(). Plugs this DMA address into sg using sg_dma_address and sg_dma_length.
And then maps mr and sg using call ib_map_mr_sg() and calls to these succeed.

Then program posts IB_WR_REG_MR work request as below: (Simulates ibv_reg_mr())

wr.opcode = IB_WR_REG_MR;
wr.send_flags = IB_SEND_SIGNALED;
wr.mr = mr;
wr.key = rkey;
wr.access = 0;
error = ib_post_send( qp, &wr, &bad_wr);
ib_post_send is successful. Then it waits for completion on call ib_poll_cq(cq, 1, &wc);
It is failing at ASSERT(wc.status == IB_WC_SUCCESS) as we have wc.status set to IB_WC_MW_BIND_ERR.

QUESTIONS:

  1. Can you please tell us, why do we get this error on FreeBSD?
  2. Is there any documentation on ib calls used in kernel mode?

Displaying some data from dump:

(kgdb) p wc
$1 = {{wr_id = 0, wr_cqe = 0x0}, status = IB_WC_MW_BIND_ERR, opcode = -511, vendor_err = 120, byte_len = 0, qp = 0xfffff80040687800,
ex = {imm_data = 8, invalidate_rkey = 8}, src_qp = 0, wc_flags = -2096742349, pkey_index = 65535, slid = 65535, sl = 161 ‘\241’,
dlid_path_bits = 0 ‘\000’, port_num = 0 ‘\000’, smac = “\000\000\000\000\000\b”, vlan_id = 0, network_hdr_type = 0 ‘\000’}

our register WR is as below
(kgdb) p wr
$3 = {wr = {next = 0x0, {wr_id = 0, wr_cqe = 0x0}, sg_list = 0x0, num_sge = 0, opcode = IB_WR_REG_MR, send_flags = 2, ex = {
imm_data = 0, invalidate_rkey = 0}}, mr = 0xfffff8004068ad80, key = 1857536, access = 0}

QP from dump

(kgdb) p * qp
$7 = {device = 0xfffffe011e04b000, pd = 0xfffff8000b1c4600, send_cq = 0xfffff80040687c00, recv_cq = 0xfffff80040687c00, mr_lock = {
m = {lock_object = {lo_name = 0xffffffff82fe57e0 “lnxspin”, lo_flags = 16842752, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}},
srq = 0x0, xrcd = 0x0, xrcd_list = {next = 0x0, prev = 0x0}, usecnt = {counter = 0}, open_list = {next = 0x0, prev = 0x0},
real_qp = 0xfffff80040687800, uobject = 0x0, event_handler = 0x0, qp_context = 0xfffff8000cf3a440, qp_num = 349, max_write_sge = 6,
max_read_sge = 6, qp_type = IB_QPT_RC, rwq_ind_tbl = 0x0, port = 0 ‘\000’}
(kgdb)

MR related to this WR is
kgdb) p *wr->mr
$9 = {device = 0xfffffe011e04b000, pd = 0xfffff8000b1c4600, lkey = 1857536, rkey = 1857536, iova = 18446741879489164128,
length = 1512, page_size = 4096, type = IB_MR_TYPE_MEM_REG, need_inval = false, {uobject = 0x0, qp_entry = {next = 0x0,
prev = 0x0}}, dm = 0x0, sig_attrs = 0x0}
(kgdb)

(kgdb) p /x *rfmr->sg
$10 = {page_link = 0x2, offset = 0x360, length = 0x5e8, dma_address = 0x4425f360, dma_map = 0x0}

Information on mlx device:
root@test1:~ # ibstat
CA ‘mlx5_0’
CA type: MT4124
Number of ports: 1
Firmware version: 20.31.1014
Hardware version: 0
Node GUID: 0x005056fffe8b6f9a
System image GUID: 0xb88303ffff8bf28c
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x025056fffe8b6f9a
Link layer: Ethernet
CA ‘mlx5_1’
CA type: MT4124
Number of ports: 1
Firmware version: 20.31.1014
Hardware version: 0
Node GUID: 0x005056fffe8b0996
System image GUID: 0xb88303ffff8bf28c
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x025056fffe8b0996
Link layer: Ethernet
root@test1:~ # pciconf -lv | grep mlx -C 3
device = ‘VMXNET3 Ethernet Controller’
class = network
subclass = ethernet
mlx5_core0@pci0:19:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x101c subvendor=0x1590 subdevice=0x02af
vendor = ‘Mellanox Technologies’
device = ‘MT28908 Family [ConnectX-6 Virtual Function]’
class = network
subclass = ethernet
mlx5_core1@pci0:27:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x101c subvendor=0x1590 subdevice=0x02af
vendor = ‘Mellanox Technologies’
device = ‘MT28908 Family [ConnectX-6 Virtual Function]’
class = network
root@test1:~ # sysctl -a | grep Mellanox
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
dev.mlx5_core.1.%desc: Mellanox Core driver 3.7.1 (November 2021)
dev.mlx5_core.0.%desc: Mellanox Core driver 3.7.1 (November 2021)

Hello suresh.pujar,

Welcome, and thank you for posting your inquiry to the NVIDIA Developer Forums.

Q) Can you tell us why we get this error?
A) Our exposure to BSD applications is limited. This type of issue would need to be investigated via a support case, however do bear in mind that programming assistance / code review is outside of the scope of NVIDIA Enterprise Support. If assistance is needed with programming / code review, we would recommend engaging our Sales and Solutions team so they can understand your goals and set you up with a solution that fits your specific business needs.

Q) Any documentation on IB calls used in kernel mode?
A) Unfortunately no, this is dealing with the inner workings of the driver and may be proprietary information. This request will also need to be run through our Solutions team.

To get in touch with our Sales and Solutions team, please use the following web form:

To open an NVIDIA support ticket, please use the following link:
https://enterprise-support.nvidia.com/s/create-case
Note that engaging NVIDIA support will require support entitlement.
You will be prompted to enter your entitlement information (serial number of device, support entitlement certificate number, etc) at that link.

Thanks, and best regards;
NVIDIA Enterprise Experience

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.