Various ping programs segfaulting

I have a build of rdma-core in kernel 4.17 using yocto for an Altera Arria10 with a dual-core A53 ARM processor. The system is build and rxe configures correctly, i.e. I can rxe_cfg start, rxe_cfg add eth0 and ibv_devices looks good:

root@arria10:~# rxe_cfg status

Name Link Driver Speed NMTU IPv4_addr RDEV RMTU

eth0 yes st_gmac 1500 10.0.1.28 rxe0 1024 (3)

root@arria10:~# ibv_devices

device node GUID


rxe0 085697fffec1059b

root@arria10:~# ibv_devinfo rxe0

hca_id: rxe0

transport: InfiniBand (0)

fw_ver: 0.0.0

node_guid: 0856:97ff:fec1:059b

sys_image_guid: 0000:0000:0000:0000

vendor_id: 0x0000

vendor_part_id: 0

hw_ver: 0x0

phys_port_cnt: 1

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

This all looks good. However, when I try to ping this machine against a PC running rdma-core, I’m getting some strange errors including a segfault when the Arria10 acts as server for udaddy.

root@arria10:~# udaddy -s 10.0.1.16

udaddy: starting client

[ 1883.526301] rdma_rxe: null vaddr

udaddy: connecting

failed to reg MR

udaddy: failed to create messages: -1

test complete

Segmentation faultrxe_mem_init_user

I traced the first error, rdma_rxe: null vaddr to rxe_mem_init_user() in /drivers/infiniband/sw/rxe/rxe_mr.c It appears that a page address, perhaps from a virtual to physical translation is failing. Any thoughts on how to solve this?

Thanks,

FM

This turned out to be a nasty little bug. Turns out there is place where the rxe driver is registering memory that uses are area of memory that is not available in the ARM processor we are using. Here’s the patch that made it work…

2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c

index 5c2684b…f2dc5a7 100644

— a/drivers/infiniband/sw/rxe/rxe_mr.c

+++ b/drivers/infiniband/sw/rxe/rxe_mr.c

@@ -31,6 +31,7 @@

  • SOFTWARE.

*/

+#include <linux/highmem.h>

#include “rxe.h”

#include “rxe_loc.h”

@@ -94,7 +95,15 @@ static void rxe_mem_init(int access, struct rxe_mem *mem)

void rxe_mem_cleanup(struct rxe_pool_entry *arg)

{

struct rxe_mem *mem = container_of(arg, typeof(*mem), pelem);

  • int i;
  • int i, entry;

  • struct scatterlist *sg;

  • if (mem->kmap_occurred) {

  • for_each_sg(mem->umem->sg_head.sgl, sg,

  • mem->umem->nmap, entry) {

  • kunmap(sg_page(sg));

  • }

  • }

if (mem->umem)

ib_umem_release(mem->umem);

@@ -200,12 +209,14 @@ int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,

buf = map[0]->buf;

for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {

  • vaddr = page_address(sg_page(sg));
  • // vaddr = page_address(sg_page(sg));

  • vaddr = kmap(sg_page(sg));

if (!vaddr) {

pr_warn(“null vaddr\n”);

err = -ENOMEM;

goto err1;

}

  • mem->kmap_occurred = 1;

buf->addr = (uintptr_t)vaddr;

buf->size = BIT(umem->page_shift);

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h

index af1470d…9bd7eac 100644

— a/drivers/infiniband/sw/rxe/rxe_verbs.h

+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h

@@ -343,6 +343,8 @@ struct rxe_mem {

u32 num_map;

struct rxe_map **map;

  • int kmap_occurred;

};

struct rxe_mc_grp {

2.7.4

The idea is that you need to use kmap()/kunmap() rather than page_address() to handle these memory regions that are being used by both the kernel and user memory to make this work on the ARM…

Thanks,

FM