Here is my configuration (for both target and client):
OS: CentOS 7.4
Kernel: 4.14.93
Adapter: Connect-X 5
MOFED: 4.5-1.0.1.0 (installed with nvmf support)
I followed the exact steps listed in this article.
Setting up the offloaded subsystem works fine:
dmesg | grep “enabling port” shows the following output:
[ 80.840122] nvmet_rdma: enabling port 1 (10.3.1.66:4420)
However, trying to connect from the client fails with this message:
nvme connect -t rdma -n nvmf_ss -a 10.3.1.66 -s 4420
Failed to write to /dev/nvme-fabrics: Input/output error
On the target, I see the following messages in dmesg:
[ 80.818518] nvmet: adding nsid 1 to subsystem nvmf_ss
[ 80.840122] nvmet_rdma: enabling port 1 (10.3.1.66:4420)
[ 1092.516129] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN 2014-08.org.nvmexpress:uuid:0e455f8f-65cd-4822-9304-2768a722e81b.
[ 1092.550328] nvmet: creating controller 2 for subsystem nvmf_ss for NQN 2014-08.org.nvmexpress:uuid:0e455f8f-65cd-4822-9304-2768a722e81b.
[ 1092.579202] nvmet_rdma: using dynamic staging buffer ffff9b91b5ab7c00
[ 1092.736825] nvmet_rdma: received IB QP event: last WQE reached (16)
[ 1229.611790] INFO: task kworker/3:2:375 blocked for more than 120 seconds.
[ 1229.625404] Tainted: G OE 4.14.93 #1
[ 1229.635680] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 1229.651418] kworker/3:2 D 0 375 2 0x80000000
[ 1229.662450] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 1229.672376] Call Trace:
[ 1229.677320] ? __schedule+0x22f/0x810
[ 1229.684680] schedule+0x28/0x80
[ 1229.691007] schedule_timeout+0x26e/0x350
[ 1229.699036] wait_for_completion+0xfb/0x160
[ 1229.707445] ? wake_up_q+0x70/0x70
[ 1229.714279] flush_workqueue+0x13d/0x430
[ 1229.722143] ? nvmet_rdma_cm_reject+0x70/0x70 [nvmet_rdma]
[ 1229.733162] nvmet_rdma_queue_connect+0x634/0x9e0 [nvmet_rdma]
[ 1229.744868] ? cma_acquire_dev+0x10a/0x230 [rdma_cm]
[ 1229.754831] ? cma_req_handler+0x19d/0x880 [rdma_cm]
[ 1229.764812] cma_req_handler+0x31f/0x880 [rdma_cm]
[ 1229.774471] cm_process_work+0x22/0x130 [ib_cm]
[ 1229.783562] ? cm_get_bth_pkey.isra.67+0x36/0x90 [ib_cm]
[ 1229.794261] cm_req_handler+0x92e/0xe80 [ib_cm]
[ 1229.803367] ? __switch_to_asm+0x24/0x60
[ 1229.811248] ? __switch_to_asm+0x30/0x60
[ 1229.819116] cm_work_handler+0x113/0x13a9 [ib_cm]
[ 1229.828547] ? __switch_to_asm+0x30/0x60
[ 1229.836446] ? __switch_to_asm+0x24/0x60
[ 1229.844297] process_one_work+0x18a/0x3a0
[ 1229.852369] worker_thread+0x2e/0x390
[ 1229.859716] ? pwq_unbound_release_workfn+0xd0/0xd0
[ 1229.869509] kthread+0x119/0x130
[ 1229.876008] ? __kthread_parkme+0xa0/0xa0
[ 1229.884676] ret_from_fork+0x35/0x40
[ 1229.892389] INFO: task kworker/3:0:3055 blocked for more than 120 seconds.
Any idea what the problem could be?