Getting vender_err 87

system · July 8, 2016, 9:53am

Hi,

In one of our setup After changing number of jobs from 4 to 8, with the below error Host is crashing. Any inputs on this

[ 145.717020] 3cq completion failed with wr_id 0 status 13 opcode 1 vender_err 87

[ 145.717505] ERROR EXIT nvmeof_rdma_cq_event_handler

[ 145.718038] 3cq completion in ERROR state

spruitt · July 14, 2016, 4:56pm

Hi Rama,

So if you are not using iser, srp nor NVmeOF, then what protocol are you using to talk to your SSD’s?

Thank you,

Sophie.

spruitt · July 11, 2016, 11:52pm

Hi Rama,

What OS, Kernel and driver version are you using? (modinfo mlx4_core | grep -i version).

Have you seen an followed documents:

HowTo Compile Linux Kernel for NVMe over Fabrics https://community.mellanox.com/s/article/howto-compile-linux-kernel-for-nvme-over-fabrics

HowTo Configure NVMe over Fabrics https://community.mellanox.com/s/article/howto-configure-nvme-over-fabrics

What is the last trace generated in the messages file prior to crash?

Are you getting the same result with any jobs above 4 ? (IE: 5,6,7)

vender_err 87 reports a number of RNR NACK exceeding and terminate the QP. (receiver not ready (RNR) error).

Regards,

Sophie.

rdarbha · July 13, 2016, 2:14pm

we are using mlx4 driver , kernel version is 3.17 RHEL

rdarbha · July 13, 2016, 5:02am

I am not using NVEOF drivers which are mentioned as the below document

HowTo

Compile Linux Kernel for NVMe over Fabrics https://community.mellanox.com/s/article/howto-compile-linux-kernel-for-nvme-over-fabrics

HowTo

Configure NVMe over Fabrics https://community.mellanox.com/s/article/howto-configure-nvme-over-fabrics

spruitt · July 12, 2016, 5:49pm

Hi Rama,

I am a little confused, what type of HCA card’s are installed on the Initiator/Target and are you using the mlx* Inbox driver from RHEL?

You posted Kernel version 3.17 but what OS version? (more /etc/issue or /etc/redhat-release).

This error correlate to Buffer/Memory allocation which can be possibly a FW issue on the HCA cards.

Based on the HCA cards, what is the FW running on them?

Thank you,

Sophie.

spruitt · July 13, 2016, 6:55pm

Hi Rama,

You posted Kernel version 3.17 but what OS version? (more /etc/issue or /etc/redhat-release).

Are you then using iser or srp for your configuration?

Thank you,

Sophie.

rdarbha · July 12, 2016, 2:08pm

Hi Sophie,

Please find my answers inline

What OS, Kernel and driver version are you using? (modinfo

mlx4_core | grep -i version).

RHEL, kernel version 3.17

Have you seen an followed documents:

HowTo

Compile Linux Kernel for NVMe over Fabrics https://community.mellanox.com/s/article/howto-compile-linux-kernel-for-nvme-over-fabrics

HowTo

Configure NVMe over Fabrics https://community.mellanox.com/s/article/howto-configure-nvme-over-fabrics

[We are not referring this doc.] (we are not working NVMe OF standard Linux drivers)

What is the last trace

generated in the messages file prior to crash?

Are you getting the same result with any jobs above 4 ? (IE:

5,6,7)

We are running the 4 or more threads/jobs and getting into

situation.

vender_err 87 reports a number of RNR NACK exceeding and terminate

the QP. (receiver not ready (RNR) error).

In which situation we expect the receiver to flag RNR.

Is there OFED, mlx4 driver

dependency on this?

Or receiver does not have

sufficient CPU cycles?

spruitt · July 12, 2016, 6:03pm

Hi Rama,

Please disregard my statement about the type of HCA’s as it is not related here.

Also, what did you mean by " we are not working of standard Linux drivers".

Thank you,

Sophie.

spruitt · July 13, 2016, 5:43am

HI Rama,

Then can you please describe in details you current configuration and which mlx* driver you are using?

Regards,

Sophie.

rdarbha · July 14, 2016, 4:56am

[root@xhdipsnvme1 ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Workstation release 7.0 (Maipo)

Are you then using iser or srp for your configuration?

In our kernel configuration we made iser and srp as loadble modules. But we are not using in our testing.

Topic		Replies	Views
ConnectX-5 error: Failed to write to /dev/nvme-fabrics: Invalid cross-device link Ethernet Adapter Cards lspci	4	3097	January 21, 2019
Setting up Mellanox NVMf offload Ethernet Adapter Cards	3	1245	October 10, 2019
VLAN setup on SR-IOV enabled Mellanox ConnectX-3 (CentOS7) Ethernet Adapter Cards	8	848	March 17, 2015
Red Hat 6.7 + NFS over RDMA	5	313	June 22, 2016
Nvidia Kernel 6 Linux	1	1633	November 22, 2022
RoCE kernel error CREATE_MKEY failed, status limits exceeded Network Management Products kernel , rdma-and-roce	2	989	November 15, 2023
Windows VMs hang out NFSoRDMA on CentOS 6.5 Mellanox OFED	2	314	September 24, 2014
mlx5_0 port is Down ! InfiniBand/VPI Adapter Cards	9	3745	April 12, 2024
MLX Completion with error Mellanox OFED infiniband	2	1818	June 20, 2023
ENABLE_HCA timeout when enabling more than 7 VFs on ConnectX-5 Adapters and Cables	2	882	December 16, 2019

Getting vender_err 87

Related topics