Got "local protection error" when doing IBV_WR_SEND

I’m trying to use RDMA to send data to or from a FPGA PCIe card.
The test platform is an Intel CPU PC with a cx4121a card and a FPGA PCIe card. OS is Ubuntu 22.04 x86_64, with MLNX_OFED_LINUX-5.8-2.0.3.0.
I wrote a kernel driver for the FPGA card to implement a peer_memory_client, which heavily referenced nv_peer_memory.
It worked fine on IBV_WR_RECV, cx4121a could successfully write the data it received to FPGA card’s BAR2 address (which was mapped to DDR on FPGA card). But when doing IBV_WR_SEND (using cx4121a to send data from FPGA card’s BAR2 address to remote), I got a local protection error. And there was no warning or error message in kernel log.
It was the same buffer address/length and same memory region used in both IBV_WR_SEND and IBV_WR_RECV, so I have no clue why IBV_WR_RECV worked but IBV_WR_SEND not.

Any help would be appreciated, thanks!

Here are some details:

  1. I implement mmap in my driver to map FPGA card’s BAR2 into user space, get an virtual address
	vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
	vma->vm_page_prot = pgprot_noncached(vm_get_page_prot(vma->vm_flags));
	ret = remap_pfn_range(vma, vma->vm_start,
				pfn + vma->vm_pgoff,
				user_size,
				vma->vm_page_prot);
  1. fill this virtual address to struct ibv_sge, then pass it to ibv_post_send or ibv_post_recv
    ib_res.buf = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

    struct ibv_sge list = {
            .addr	= (uint64_t)ib_res.buf,
            .length = size,
            .lkey	= ib_res.mr->lkey
    };

    struct ibv_send_wr *bad_wr, wr = {
            .wr_id	    = SEND_WRID,
            .sg_list    = &list,
            .num_sge    = 1,
            .opcode     = IBV_WR_SEND,
            .send_flags = IBV_SEND_SIGNALED,
            .next       = NULL
    };

    ibv_post_send(ib_res.qp, &wr, &bad_wr);
  1. ib core driver will find out this user virtual address is not normal memory, so it will try to get pages from peers
  2. I implement peer_memory_client in my driver and register it to ib core. when ib core asks to do a dma_map, I fill the physical address of FPGA card’s BAR2 into the sg table.
	for_each_sg(sg_head->sgl, sg, hello_mem_context->npages, i) {
		sg_set_page(sg, NULL, hello_mem_context->page_size, 0);
		sg->dma_address = bar2_phys_addr[hello_mem_context->belongs_to_card] + ((i++) * hello_mem_context->page_size);
		sg->dma_length = hello_mem_context->page_size;
	}

If I replaced FPGA card’s BAR2 with a kmalloc buffer in driver, every thing worked fine. So I assume the driver in general is good, just something is wrong with PCIe BAR address mapping in driver?

SEND is different with regular RDMA read/write, it need mem copy to HCA buffer. If you are developing driver, better way is wotk with kernel community and get IB HCA PRM (program manual) from NIVDIA sales.

Thanks for the infomation!

I’ve tested four APIs:

  1. ibv_post_recv
  2. ibv_post_send with opcode = IBV_WR_SEND
  3. ibv_post_send with opcode = IBV_WR_RDMA_READ
  4. ibv_post_send with opcode = IBV_WR_RDMA_WRITE

1 and 3 successed, 2 and 4 failed with local protection error. So it seems HCA can write to FPGA card’ PCIe BAR address, but cann’t read from it… Any thoughts?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.