XDP multi-buffer feature not working with RX Striding RQ on ConnectX-6

I want to run an XDP program that supports jumbo frames (9k MTU) on a ConnectX-6 card. My program is loaded to the kernel with frags (multi-buffer) support enabled (SEC(“xdp.frags”)), but when rx_striding_rq is enabled all packets passed to the program have data and data_end pointers pointing to the same memory address, so my program doesn’t work correctly.

I’m using MLNX_OFED drivers, version 24.04-0.7.0. The release notes for this version explicitly say support for this feature is present:

  • Added XDP multi-buffer support to the default RQ type (Striding RQ)

Steps to reproduce:

  • Configure interface to use 9k MTU
  • Enable rx_striding_rq (ethtool --set-priv-flags rx_striding_rq on)
  • Load XDP program with multi-buffer support enabled (SEC(“xdp.frags”))

If I disable rx_striding_rq, all works fine. I’m using a Ubuntu-provided kernel 6.5.0-35. I also tried with kernel 6.8.0-39 and the problem persists.

I also tried using the in-tree mlx5 driver instead of the one provided by MLNX_OFED. But on both kernels tested, the issue was the same. The program is loaded, but the pointers received by the XDP program are bogus.

I also tried changing the MTU value, and looks like all works fine until MTU 3498. Up from there the problem starts to happen.

Given the performance benefits of using rx_striding_rq, I’d like to run my XDP program with jumbo frame support and rx_striding_rq enabled at the same time. Is it possible? Is this a known driver/kernel bug?

More information about my system:

# cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

# ethtool -i enp3s0np0 
driver: mlx5_core
version: 24.04-0.7.0
firmware-version: 20.32.2004 (DEL0000000013)
expansion-rom-version: 
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

# uname -a
Linux gr2-sao 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May  7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

# lspci -vvvv 
[...]
03:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
        Subsystem: Mellanox Technologies MT28908 Family [ConnectX-6]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 66
        NUMA node: 0
        Region 0: Memory at 90000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at 94200000 [disabled] [size=1M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (downgraded), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn+
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [48] Vital Product Data
                Product Name: Mellanox ConnectX-6 Single Port VPI HDR100 QSFP Adapter
                Read-only fields:
                        [PN] Part number: 07TKND
                        [EC] Engineering changes: A01
                        [MN] Manufacture ID: 1028
                        [SN] Serial number: TW07TKND7873506JT0DW
                        [VA] Vendor specific: DSV1028VPDR.VER2.1
                        [VB] Vendor specific: FFV20.32.20.04
                        [VC] Vendor specific: NPY1
                        [VD] Vendor specific: PMTD
                        [VE] Vendor specific: NMVMellanox Technologies, Inc.
                        [VG] Vendor specific: DCM1001FFFFFF
                        [VH] Vendor specific: L1D0
                        [VU] Vendor specific: TW07TKND7873506JT0DWMLNXS0D0F0 
                        [RV] Reserved: checksum good, 3 byte(s) reserved
                End
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
                AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [1c0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [320 v1] Lane Margining at the Receiver <?>
        Capabilities: [370 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [420 v1] Data Link Feature <?>
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

Hi Matheus,

Thank you for posting your query on NVIDIA Community.

Based on the information shared, the PSID of HCA in use is an OEM(Dell) branded one. In such situations, the general procedure is to reach out to OEM and if needed OEM will reach out to us. This is based on the firmware in use on OEM cards and in case OEM has certain customizations, the behavior seen could be different.

Thanks,
Namrata.