Can not set CX6 DX to switchdev mode

I was building development environment for DPDK on CX6DX. After installing OFED softwares, I find that I can not set CX6DX to switchdev mode.

output:

# echo switchdev > /sys/class/net/eth20/compat/devlink/mode 
bash: echo: write error: Invalid argument

dmesg output:

[ 3622.167092] mlx5_core 0000:45:00.0: E-Switch: Disable: mode(LEGACY), nvfs(2), necvfs(0), active vports(0)
[ 3622.167198] bus: 'auxiliary': remove device mlx5_core.rdma.2
[ 3622.167598] PM: Removing info for No Bus:issm2
[ 3622.167695] PM: Removing info for No Bus:umad2
[ 3622.167805] PM: Removing info for No Bus:uverbs2
[ 3622.171104] PM: Removing info for No Bus:mlx5_2
[ 3623.418766] PM: Removing info for auxiliary:mlx5_core.rdma.2
[ 3623.418803] device: 'mlx5_core.eth-rep.2': device_add
[ 3623.418820] bus: 'auxiliary': add device mlx5_core.eth-rep.2
[ 3623.418839] PM: Adding info for auxiliary:mlx5_core.eth-rep.2
[ 3623.418860] bus: 'auxiliary': __driver_probe_device: matched device mlx5_core.eth-rep.2 with driver mlx5_core.eth-rep
[ 3623.418867] bus: 'auxiliary': really_probe: probing driver mlx5_core.eth-rep with device mlx5_core.eth-rep.2
[ 3623.418876] mlx5_core.eth-rep mlx5_core.eth-rep.2: no default pinctrl state
[ 3623.418898] driver: 'mlx5_core.eth-rep': driver_bound: bound to device 'mlx5_core.eth-rep.2'
[ 3623.418915] bus: 'auxiliary': really_probe: bound device mlx5_core.eth-rep.2 to driver mlx5_core.eth-rep
[ 3623.418926] device: 'mlx5_core.rdma-rep.2': device_add
[ 3623.418933] bus: 'auxiliary': add device mlx5_core.rdma-rep.2
[ 3623.418982] PM: Adding info for auxiliary:mlx5_core.rdma-rep.2
[ 3623.418999] bus: 'auxiliary': __driver_probe_device: matched device mlx5_core.rdma-rep.2 with driver mlx5_ib.rep
[ 3623.419003] bus: 'auxiliary': really_probe: probing driver mlx5_ib.rep with device mlx5_core.rdma-rep.2
[ 3623.419008] mlx5_ib.rep mlx5_core.rdma-rep.2: no default pinctrl state
[ 3623.419022] driver: 'mlx5_ib.rep': driver_bound: bound to device 'mlx5_core.rdma-rep.2'
[ 3623.419035] bus: 'auxiliary': really_probe: bound device mlx5_core.rdma-rep.2 to driver mlx5_ib.rep
[ 3623.421954] mlx5_core 0000:45:00.0: mlx5_cmd_out_err:829:(pid 7722): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x201c1c), err(-22)
[ 3623.422218] mlx5_core 0000:45:00.0: mlx5_rdma_enable_roce_steering:71:(pid 7722): Failed to create RDMA RX flow group err(-22)
[ 3623.423116] mlx5_core 0000:45:00.0: mlx5_rdma_enable_roce:164:(pid 7722): Failed to enable RoCE steering: -22
[ 3623.424549] bus: 'auxiliary': remove device mlx5_core.rdma-rep.2
[ 3623.424601] PM: Removing info for auxiliary:mlx5_core.rdma-rep.2
[ 3623.424637] bus: 'auxiliary': remove device mlx5_core.eth-rep.2
[ 3623.424662] PM: Removing info for auxiliary:mlx5_core.eth-rep.2
[ 3623.424685] device: 'mlx5_core.rdma.2': device_add
[ 3623.424699] bus: 'auxiliary': add device mlx5_core.rdma.2
[ 3623.424714] PM: Adding info for auxiliary:mlx5_core.rdma.2
[ 3623.424737] bus: 'auxiliary': __driver_probe_device: matched device mlx5_core.rdma.2 with driver mlx5_ib.rdma
[ 3623.424743] bus: 'auxiliary': really_probe: probing driver mlx5_ib.rdma with device mlx5_core.rdma.2
[ 3623.424751] mlx5_ib.rdma mlx5_core.rdma.2: no default pinctrl state
[ 3623.428885] device: 'mlx5_2': device_add
[ 3623.428938] PM: Adding info for No Bus:mlx5_2
[ 3623.437372] device: 'uverbs2': device_add
[ 3623.437402] PM: Adding info for No Bus:uverbs2
[ 3623.437733] device: 'umad2': device_add
[ 3623.437759] PM: Adding info for No Bus:umad2
[ 3623.437806] device: 'issm2': device_add
[ 3623.437829] PM: Adding info for No Bus:issm2
[ 3623.439516] driver: 'mlx5_ib.rdma': driver_bound: bound to device 'mlx5_core.rdma.2'
[ 3623.439543] bus: 'auxiliary': really_probe: bound device mlx5_core.rdma.2 to driver mlx5_ib.rdma
[ 3623.439562] mlx5_core 0000:45:00.0: esw_compat_write:353:(pid 7722): mlx5_core: Failed setting eswitch to offloads

ENV:

$ uname -r
6.1.67
# OFED VERSION
MLNX_OFED_SRC-23.10-1.1.9.0
# compiler version
# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-11/root/usr/libexec/gcc/x86_64-redhat-linux/11/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-11/root/usr --mandir=/opt/rh/devtoolset-11/root/usr/share/man --infodir=/opt/rh/devtoolset-11/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --with-default-libstdcxx-abi=gcc4-compatible --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-11.2.1-20220127/obj-x86_64-redhat-linux/isl-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.2.1 20220127 (Red Hat 11.2.1-9) (GCC) 

ethtool info

# ethtool -i eth20
driver: mlx5_core
version: 23.10-1.1.9
firmware-version: 22.31.2912 (ALI0000000017)
expansion-rom-version: 
bus-info: 0000:45:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
# ethtool -i eth21
driver: mlx5_core
version: 23.10-1.1.9
firmware-version: 22.31.2912 (ALI0000000017)
expansion-rom-version: 
bus-info: 0000:45:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Setup before trying to set switchdev:

# 1. set steering_mode to dmfs
# cat /sys/class/net/eth20/compat/devlink/steering_mode 
dmfs
# cat /sys/class/net/eth21/compat/devlink/steering_mode 
dmfs
# 2. set VF
# cat /sys/class/net/eth20/device/sriov_numvfs 
2
# 3. unbound VF
# ibdev2netdev 
...
mlx5_2 port 1 ==> eth20 (Up)
mlx5_3 port 1 ==> eth21 (Up)
# VFs do not show up, but exist.
# lspci | grep "Virtual Function"
45:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
45:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

please make sure SRIOV_EN is true

find your device

mst status -vv

check config

mlxconfig -d mt4125_pciconf0 | grep SRIOV_EN
mlxconfig -d mt4125_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=16
reboot

Hi, I also meet this problem too:

mlx5_core 0000:98:00.0: mlx5_rdma_enable_roce_steering:71:(pid 3537): Failed to create RDMA RX flow group err(-22)
mlx5_rdma_enable_roce:164:(pid 3537): Failed to enable RoCE steering: -22
mlx5_core 0000:98:00.0: esw_compat_write:353:(pid 3537): mlx5_core: Failed setting eswitch to offloads

Server:R750
Ubuntu: 22.04.4
Kernel:5.15.0-97-generic
OFED:23.10-2.1.3.1
NIC:MCX623106AC―CDAT ('lel lanox ConnectI 6 Dx Crypto cnabled neti,vorli ad:ipter)

sudo ethtool -i enp152s0f0np0
driver: mlx5_core
version: 23.10-2.1.3
firmware-version: 22.32.2004 (DEL0000000027)
expansion-rom-version:
bus-info: 0000:98:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

I have already set ‘SR-IOV Global Enable’ to ‘Enabled’ in BIOS, and set SRIPV_EN true for NIC

I can’t figure out why this happen

or check lspci -vv and see if “Total VFs” below

08:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
Subsystem: Mellanox Technologies Device 0083

Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00

Below is my lspci -vv:
98:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
Subsystem: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 18
NUMA node: 1
IOMMU group: 104
Region 0: Memory at d4000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at d1000000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn+
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [48] Vital Product Data
Product Name: ConnectX-6 DX 2-port 100GbE QSFP56 PCIe Adapter
Read-only fields:
[PN] Part number: 0F6FXM
[EC] Engineering changes: A03
[MN] Manufacture ID: 1028
[SN] Serial number: TW0F6FXM7873528RT2Z2
[VA] Vendor specific: DSV1028VPDR.VER2.2
[VB] Vendor specific: FFV22.32.20.04
[VC] Vendor specific: NPY2
[VD] Vendor specific: PMTD
[VE] Vendor specific: NMVMellanox Technologies, Inc.
[VF] Vendor specific: DTINIC
[VG] Vendor specific: DCM1001FFFFFF2101FFFFFF
[VH] Vendor specific: L1D0
[VU] Vendor specific: TW0F6FXM7873528RT2Z2MLNXS0D0F0
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+ AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+ MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+ IOVSta: Migration- Initial VFs: 8, Total VFs: 8, Number of VFs: 2, Function Dependency Link: 00 VF offset: 2, stride: 1, Device ID: 101e Supported Page Size: 000007ff, System Page Size: 00000001 Region 0: Memory at 00000000d6800000 (64-bit, prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [1c0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [230 v1] Access Control Services ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Capabilities: [320 v1] Lane Margining at the Receiver <?>
Capabilities: [370 v1] Physical Layer 16.0 GT/s <?> Capabilities: [420 v1] Data Link Feature <?>
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core

SRIOV related:
Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 2, Function Dependency Link: 00
VF offset: 2, stride: 1, Device ID: 101e
Supported Page Size: 000007ff, System Page Size: 00000001
Region 0: Memory at 00000000d6800000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0

your lspci looks correct to me.
btw, have you ever tried enable switchdev mode using devlink?
ex:
devlink dev show
devlink dev eswitch set pci/0000:08:00.0 mode switchdev

I have tried.
The error is below:
Error: mlx5_core: Failed setting eswitch to offloads.
Kernel answrer: Invalid argument