Random packet loss when using raw packet QPs and L2 flows

Hello there,

I wanted to stress test my CX3 cluster. To do so, I am using the example applications provided here: Raw Ethernet Programming: Basic Introduction - Code Example. https://community.mellanox.com/s/article/raw-ethernet-programming--basic-introduction---code-example Except that instead of an ICMP packet the sender sends a raw ethernet payload (EtherType == 8) with an incrementing counter. I also added an extra wait time at the send of the send.

With that setup I am noticing random packet loss (i.e. non-sequential counters) on the receiver side. The faster the rate (ie. the smaller the extra wait time) the larger the gaps in sequence numbers. My QPs are large enough (1000s of WR) and I do not run into any queue overrun. Besides, the problem also appear with very slow rates (~1pkt/s).

I’ve tried to update all drivers/firmwares, to no avail. Here is my configuration:

Machine A

Device Type: ConnectX3Pro

Part Number: MCX314A-BCC_Ax

Description: ConnectX-3 Pro EN network interface card; 40GigE; dual-port QSFP; PCIe3.0 x8 8GT/s; RoHS R6

PSID: MT_1090111023

PCI Device Name: 0000:81:00.0

Port1 MAC: 248a0772ca40

Port2 MAC: 248a0772ca41

Versions: Current Available

FW 2.42.5000 2.40.7000

PXE 3.4.0752 3.4.0746

Machine B

Device Type: ConnectX3

Part Number: MCX354A-FCB_A2-A5

Description: ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6

PSID: MT_1090120019

PCI Device Name: 0000:0b:00.0

Port1 MAC: 7cfe90bed011

Port2 MAC: 7cfe90bed012

Versions: Current Available

FW 2.42.5000 2.40.7000

PXE 3.4.0752 3.4.0746

Both machines use the Mellanox OFED drivers version 4.2- and runs RHEL 7.3 with kernel 3.10.0-514.6.1.el7.x86_64.

Any help would be welcomed