We have kernel module program which uses IB network for data transfer. This works perfectly on Debian and now we are trying to port that module onto FreeBSD and facing issue with memory registration.
uname -a
FreeBSD test1 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n259630-0ca90ed42a49: Sat Jul 28 09:40:05 IST 2001 root@test1:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
Our program creates pd,mr(using ib_alloc_mr), cqs and qp(with type IB_QPT_RC). Then moves qp to states IB_QPS_INIT->IB_QPS_RTR->IB_QPS_RTS.
Gets dma_address for the malloced address by call ib_dma_map_single(). Plugs this DMA address into sg using sg_dma_address and sg_dma_length.
And then maps mr and sg using call ib_map_mr_sg() and calls to these succeed.
Then program posts IB_WR_REG_MR work request as below: (Simulates ibv_reg_mr())
wr.opcode = IB_WR_REG_MR;
wr.send_flags = IB_SEND_SIGNALED;
wr.mr = mr;
wr.key = rkey;
wr.access = 0;
error = ib_post_send( qp, &wr, &bad_wr);
ib_post_send is successful. Then it waits for completion on call ib_poll_cq(cq, 1, &wc);
It is failing at ASSERT(wc.status == IB_WC_SUCCESS) as we have wc.status set to IB_WC_MW_BIND_ERR.
QUESTIONS:
- Can you please tell us, why do we get this error on FreeBSD?
- Is there any documentation on ib calls used in kernel mode?
Displaying some data from dump:
(kgdb) p wc
$1 = {{wr_id = 0, wr_cqe = 0x0}, status = IB_WC_MW_BIND_ERR, opcode = -511, vendor_err = 120, byte_len = 0, qp = 0xfffff80040687800,
ex = {imm_data = 8, invalidate_rkey = 8}, src_qp = 0, wc_flags = -2096742349, pkey_index = 65535, slid = 65535, sl = 161 ‘\241’,
dlid_path_bits = 0 ‘\000’, port_num = 0 ‘\000’, smac = “\000\000\000\000\000\b”, vlan_id = 0, network_hdr_type = 0 ‘\000’}
our register WR is as below
(kgdb) p wr
$3 = {wr = {next = 0x0, {wr_id = 0, wr_cqe = 0x0}, sg_list = 0x0, num_sge = 0, opcode = IB_WR_REG_MR, send_flags = 2, ex = {
imm_data = 0, invalidate_rkey = 0}}, mr = 0xfffff8004068ad80, key = 1857536, access = 0}
QP from dump
(kgdb) p * qp
$7 = {device = 0xfffffe011e04b000, pd = 0xfffff8000b1c4600, send_cq = 0xfffff80040687c00, recv_cq = 0xfffff80040687c00, mr_lock = {
m = {lock_object = {lo_name = 0xffffffff82fe57e0 “lnxspin”, lo_flags = 16842752, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}},
srq = 0x0, xrcd = 0x0, xrcd_list = {next = 0x0, prev = 0x0}, usecnt = {counter = 0}, open_list = {next = 0x0, prev = 0x0},
real_qp = 0xfffff80040687800, uobject = 0x0, event_handler = 0x0, qp_context = 0xfffff8000cf3a440, qp_num = 349, max_write_sge = 6,
max_read_sge = 6, qp_type = IB_QPT_RC, rwq_ind_tbl = 0x0, port = 0 ‘\000’}
(kgdb)
MR related to this WR is
kgdb) p *wr->mr
$9 = {device = 0xfffffe011e04b000, pd = 0xfffff8000b1c4600, lkey = 1857536, rkey = 1857536, iova = 18446741879489164128,
length = 1512, page_size = 4096, type = IB_MR_TYPE_MEM_REG, need_inval = false, {uobject = 0x0, qp_entry = {next = 0x0,
prev = 0x0}}, dm = 0x0, sig_attrs = 0x0}
(kgdb)
(kgdb) p /x *rfmr->sg
$10 = {page_link = 0x2, offset = 0x360, length = 0x5e8, dma_address = 0x4425f360, dma_map = 0x0}
Information on mlx device:
root@test1:~ # ibstat
CA ‘mlx5_0’
CA type: MT4124
Number of ports: 1
Firmware version: 20.31.1014
Hardware version: 0
Node GUID: 0x005056fffe8b6f9a
System image GUID: 0xb88303ffff8bf28c
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x025056fffe8b6f9a
Link layer: Ethernet
CA ‘mlx5_1’
CA type: MT4124
Number of ports: 1
Firmware version: 20.31.1014
Hardware version: 0
Node GUID: 0x005056fffe8b0996
System image GUID: 0xb88303ffff8bf28c
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x025056fffe8b0996
Link layer: Ethernet
root@test1:~ # pciconf -lv | grep mlx -C 3
device = ‘VMXNET3 Ethernet Controller’
class = network
subclass = ethernet
mlx5_core0@pci0:19:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x101c subvendor=0x1590 subdevice=0x02af
vendor = ‘Mellanox Technologies’
device = ‘MT28908 Family [ConnectX-6 Virtual Function]’
class = network
subclass = ethernet
mlx5_core1@pci0:27:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x101c subvendor=0x1590 subdevice=0x02af
vendor = ‘Mellanox Technologies’
device = ‘MT28908 Family [ConnectX-6 Virtual Function]’
class = network
root@test1:~ # sysctl -a | grep Mellanox
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
mlx5: Mellanox Core driver 3.7.1 (November 2021)ugen1.1: at usbus1
dev.mlx5_core.1.%desc: Mellanox Core driver 3.7.1 (November 2021)
dev.mlx5_core.0.%desc: Mellanox Core driver 3.7.1 (November 2021)