random write failing with 100G connect 4x card

Hi,

I am getting fallowing dump with 100G mellanox card at initiator side. when I am running random write this failure observed. random read writing with out any issues.

could you provide inputs to resolve this issue.

[ 868.210565] mlx5_0:dump_cqe:263:(pid 0): dump error cqe

[ 868.210567] 00000000 00000000 00000000 00000000

[ 868.210568] 00000000 00000000 00000000 00000000

[ 868.210568] 00000000 00000000 00000000 00000000

[ 868.210569] 00000000 08007806 250000ea b13e11d3

[ 868.210576] nvme nvme0: MEMREG for CQE 0xffff88040f9181b8 failed with status memory management operation error (6)

Hi,

What is the test you use ? fio ?

Can you provide me the command line.

Thanks

Marc

Hi Marc,

The below patch not accessible. what is this one? is it one in MLNX OFED stack or inbox one?

https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=testing/queue-next&id=a40ac569f243db552661e6efad70080bb406823c https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=testing/queue-next&id=a40ac569f243db552661e6efad70080bb406823c

Hi,

I am not using MLNX OFED stack. I am using inbox driver. that is our main requirement. But with MLNXOFED also we are getting random write failure. The crash I have to you is not MLNX OFED one. it is inbox driver.

[root@xhd-ipsspdk1 ~]# lspci -xxxvvv | grep -i mellanox

09:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

Subsystem: Mellanox Technologies Device 0014

09:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

Subsystem: Mellanox Technologies Device 0014

[root@xhd-ipsspdk1 ~]# mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI module - Success

Loading MST PCI configuration module - Success

Create devices

Unloading MST PCI module (unused) - Success

[root@xhd-ipsspdk1 ~]#

[root@xhd-ipsspdk1 ~]# mst status -v

MST modules:


MST PCI module is not loaded

MST PCI configuration module loaded

PCI devices:


DEVICE_TYPE MST PCI RDMA NET NUMA

ConnectX4(rev:0) /dev/mst/mt4115_pciconf0.1 09:00.1 mlx5_1 net-enp9s0f1 0

ConnectX4(rev:0) /dev/mst/mt4115_pciconf0 09:00.0 mlx5_0 net-enp9s0f0 0

[root@xhd-ipsspdk1 ~]#

[root@xhd-ipsspdk1 ~]# ofed_info -s

MLNX_OFED_LINUX-3.4-2.0.0.0:

[root@xhd-ipsspdk1 ~]#

Hi,

Can you please tell me which kernel version do you use ?

I need to check what is the status of this patch in the MOFED driver.

Can you send me your lscpi output of the Mellanox Adapter you use and his PSID, and the driver version.

lspci -xxxvvv | grep -i mellanox

mst start

mst status -v

flint -d /dev/mst/ q (query to get PSID)

ofed_info -s

Thanks

Marc

Hi,

Your kernel version ?

Thanks

Marc

Dear Mr Rama Katta,

I invite you to visit this thread to see the original problem and the patch related to:

https://www.spinics.net/lists/linux-rdma/msg49979.html https://www.spinics.net/lists/linux-rdma/msg49979.html

Regards

Marc

4.8.7