Can not set CX6 DX to switchdev mode

I was building development environment for DPDK on CX6DX. After installing OFED softwares, I find that I can not set CX6DX to switchdev mode.

output:

# echo switchdev > /sys/class/net/eth20/compat/devlink/mode 
bash: echo: write error: Invalid argument

dmesg output:

[ 3622.167092] mlx5_core 0000:45:00.0: E-Switch: Disable: mode(LEGACY), nvfs(2), necvfs(0), active vports(0)
[ 3622.167198] bus: 'auxiliary': remove device mlx5_core.rdma.2
[ 3622.167598] PM: Removing info for No Bus:issm2
[ 3622.167695] PM: Removing info for No Bus:umad2
[ 3622.167805] PM: Removing info for No Bus:uverbs2
[ 3622.171104] PM: Removing info for No Bus:mlx5_2
[ 3623.418766] PM: Removing info for auxiliary:mlx5_core.rdma.2
[ 3623.418803] device: 'mlx5_core.eth-rep.2': device_add
[ 3623.418820] bus: 'auxiliary': add device mlx5_core.eth-rep.2
[ 3623.418839] PM: Adding info for auxiliary:mlx5_core.eth-rep.2
[ 3623.418860] bus: 'auxiliary': __driver_probe_device: matched device mlx5_core.eth-rep.2 with driver mlx5_core.eth-rep
[ 3623.418867] bus: 'auxiliary': really_probe: probing driver mlx5_core.eth-rep with device mlx5_core.eth-rep.2
[ 3623.418876] mlx5_core.eth-rep mlx5_core.eth-rep.2: no default pinctrl state
[ 3623.418898] driver: 'mlx5_core.eth-rep': driver_bound: bound to device 'mlx5_core.eth-rep.2'
[ 3623.418915] bus: 'auxiliary': really_probe: bound device mlx5_core.eth-rep.2 to driver mlx5_core.eth-rep
[ 3623.418926] device: 'mlx5_core.rdma-rep.2': device_add
[ 3623.418933] bus: 'auxiliary': add device mlx5_core.rdma-rep.2
[ 3623.418982] PM: Adding info for auxiliary:mlx5_core.rdma-rep.2
[ 3623.418999] bus: 'auxiliary': __driver_probe_device: matched device mlx5_core.rdma-rep.2 with driver mlx5_ib.rep
[ 3623.419003] bus: 'auxiliary': really_probe: probing driver mlx5_ib.rep with device mlx5_core.rdma-rep.2
[ 3623.419008] mlx5_ib.rep mlx5_core.rdma-rep.2: no default pinctrl state
[ 3623.419022] driver: 'mlx5_ib.rep': driver_bound: bound to device 'mlx5_core.rdma-rep.2'
[ 3623.419035] bus: 'auxiliary': really_probe: bound device mlx5_core.rdma-rep.2 to driver mlx5_ib.rep
[ 3623.421954] mlx5_core 0000:45:00.0: mlx5_cmd_out_err:829:(pid 7722): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x201c1c), err(-22)
[ 3623.422218] mlx5_core 0000:45:00.0: mlx5_rdma_enable_roce_steering:71:(pid 7722): Failed to create RDMA RX flow group err(-22)
[ 3623.423116] mlx5_core 0000:45:00.0: mlx5_rdma_enable_roce:164:(pid 7722): Failed to enable RoCE steering: -22
[ 3623.424549] bus: 'auxiliary': remove device mlx5_core.rdma-rep.2
[ 3623.424601] PM: Removing info for auxiliary:mlx5_core.rdma-rep.2
[ 3623.424637] bus: 'auxiliary': remove device mlx5_core.eth-rep.2
[ 3623.424662] PM: Removing info for auxiliary:mlx5_core.eth-rep.2
[ 3623.424685] device: 'mlx5_core.rdma.2': device_add
[ 3623.424699] bus: 'auxiliary': add device mlx5_core.rdma.2
[ 3623.424714] PM: Adding info for auxiliary:mlx5_core.rdma.2
[ 3623.424737] bus: 'auxiliary': __driver_probe_device: matched device mlx5_core.rdma.2 with driver mlx5_ib.rdma
[ 3623.424743] bus: 'auxiliary': really_probe: probing driver mlx5_ib.rdma with device mlx5_core.rdma.2
[ 3623.424751] mlx5_ib.rdma mlx5_core.rdma.2: no default pinctrl state
[ 3623.428885] device: 'mlx5_2': device_add
[ 3623.428938] PM: Adding info for No Bus:mlx5_2
[ 3623.437372] device: 'uverbs2': device_add
[ 3623.437402] PM: Adding info for No Bus:uverbs2
[ 3623.437733] device: 'umad2': device_add
[ 3623.437759] PM: Adding info for No Bus:umad2
[ 3623.437806] device: 'issm2': device_add
[ 3623.437829] PM: Adding info for No Bus:issm2
[ 3623.439516] driver: 'mlx5_ib.rdma': driver_bound: bound to device 'mlx5_core.rdma.2'
[ 3623.439543] bus: 'auxiliary': really_probe: bound device mlx5_core.rdma.2 to driver mlx5_ib.rdma
[ 3623.439562] mlx5_core 0000:45:00.0: esw_compat_write:353:(pid 7722): mlx5_core: Failed setting eswitch to offloads

ENV:

$ uname -r
6.1.67
# OFED VERSION
MLNX_OFED_SRC-23.10-1.1.9.0
# compiler version
# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-11/root/usr/libexec/gcc/x86_64-redhat-linux/11/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-11/root/usr --mandir=/opt/rh/devtoolset-11/root/usr/share/man --infodir=/opt/rh/devtoolset-11/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --with-default-libstdcxx-abi=gcc4-compatible --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-11.2.1-20220127/obj-x86_64-redhat-linux/isl-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.2.1 20220127 (Red Hat 11.2.1-9) (GCC) 

ethtool info

# ethtool -i eth20
driver: mlx5_core
version: 23.10-1.1.9
firmware-version: 22.31.2912 (ALI0000000017)
expansion-rom-version: 
bus-info: 0000:45:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
# ethtool -i eth21
driver: mlx5_core
version: 23.10-1.1.9
firmware-version: 22.31.2912 (ALI0000000017)
expansion-rom-version: 
bus-info: 0000:45:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Setup before trying to set switchdev:

# 1. set steering_mode to dmfs
# cat /sys/class/net/eth20/compat/devlink/steering_mode 
dmfs
# cat /sys/class/net/eth21/compat/devlink/steering_mode 
dmfs
# 2. set VF
# cat /sys/class/net/eth20/device/sriov_numvfs 
2
# 3. unbound VF
# ibdev2netdev 
...
mlx5_2 port 1 ==> eth20 (Up)
mlx5_3 port 1 ==> eth21 (Up)
# VFs do not show up, but exist.
# lspci | grep "Virtual Function"
45:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
45:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

please make sure SRIOV_EN is true

find your device

mst status -vv

check config

mlxconfig -d mt4125_pciconf0 | grep SRIOV_EN
mlxconfig -d mt4125_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=16
reboot

Hi, I also meet this problem too:

mlx5_core 0000:98:00.0: mlx5_rdma_enable_roce_steering:71:(pid 3537): Failed to create RDMA RX flow group err(-22)
mlx5_rdma_enable_roce:164:(pid 3537): Failed to enable RoCE steering: -22
mlx5_core 0000:98:00.0: esw_compat_write:353:(pid 3537): mlx5_core: Failed setting eswitch to offloads

Server:R750
Ubuntu: 22.04.4
Kernel:5.15.0-97-generic
OFED:23.10-2.1.3.1
NIC:MCX623106AC―CDAT ('lel lanox ConnectI 6 Dx Crypto cnabled neti,vorli ad:ipter)

sudo ethtool -i enp152s0f0np0
driver: mlx5_core
version: 23.10-2.1.3
firmware-version: 22.32.2004 (DEL0000000027)
expansion-rom-version:
bus-info: 0000:98:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

I have already set ‘SR-IOV Global Enable’ to ‘Enabled’ in BIOS, and set SRIPV_EN true for NIC

I can’t figure out why this happen

or check lspci -vv and see if “Total VFs” below

08:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
Subsystem: Mellanox Technologies Device 0083

Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00

Below is my lspci -vv:
98:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
Subsystem: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 18
NUMA node: 1
IOMMU group: 104
Region 0: Memory at d4000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at d1000000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn+
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [48] Vital Product Data
Product Name: ConnectX-6 DX 2-port 100GbE QSFP56 PCIe Adapter
Read-only fields:
[PN] Part number: 0F6FXM
[EC] Engineering changes: A03
[MN] Manufacture ID: 1028
[SN] Serial number: TW0F6FXM7873528RT2Z2
[VA] Vendor specific: DSV1028VPDR.VER2.2
[VB] Vendor specific: FFV22.32.20.04
[VC] Vendor specific: NPY2
[VD] Vendor specific: PMTD
[VE] Vendor specific: NMVMellanox Technologies, Inc.
[VF] Vendor specific: DTINIC
[VG] Vendor specific: DCM1001FFFFFF2101FFFFFF
[VH] Vendor specific: L1D0
[VU] Vendor specific: TW0F6FXM7873528RT2Z2MLNXS0D0F0
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+ AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+ MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+ IOVSta: Migration- Initial VFs: 8, Total VFs: 8, Number of VFs: 2, Function Dependency Link: 00 VF offset: 2, stride: 1, Device ID: 101e Supported Page Size: 000007ff, System Page Size: 00000001 Region 0: Memory at 00000000d6800000 (64-bit, prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [1c0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [230 v1] Access Control Services ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Capabilities: [320 v1] Lane Margining at the Receiver <?>
Capabilities: [370 v1] Physical Layer 16.0 GT/s <?> Capabilities: [420 v1] Data Link Feature <?>
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core

SRIOV related:
Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
IOVSta: Migration-
Initial VFs: 8, Total VFs: 8, Number of VFs: 2, Function Dependency Link: 00
VF offset: 2, stride: 1, Device ID: 101e
Supported Page Size: 000007ff, System Page Size: 00000001
Region 0: Memory at 00000000d6800000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0

your lspci looks correct to me.
btw, have you ever tried enable switchdev mode using devlink?
ex:
devlink dev show
devlink dev eswitch set pci/0000:08:00.0 mode switchdev

I have tried.
The error is below:
Error: mlx5_core: Failed setting eswitch to offloads.
Kernel answrer: Invalid argument

Did you solved your problem?
I have almost same problem with you. When trying to enable eswitch (cmd: devlink dev eswitch set pci/0000:02:00.0 mode switchdev), dmesg shows error:

mlx5_core 0000:02:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:02:00.0: mlx5_cmd_out_err:829:(pid 1021051): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x201c1c), err(-22)
mlx5_core 0000:02:00.0: mlx5_rdma_enable_roce_steering:71:(pid 1021051): Failed to create RDMA RX flow group err(-22)
mlx5_core 0000:02:00.0: mlx5_rdma_enable_roce:164:(pid 1021051): Failed to enable RoCE steering: -22
mlx5_core 0000:02:00.0: esw_compat_write:353:(pid 1021051): mlx5_core: Failed setting eswitch to offloads

I’ve not change any setting to the nic before. And the SRIOV_EN is enabled, NUM_OF_VFS is 8

— My server —

Ubuntu: 22.04
Kernel: 6.2.16-060216-generic (upgraded kernel)
NIC: MCX623106AN-CDAT

# ethtool -i enp2s0f0np0
driver: mlx5_core
version: 24.01-0.3.3
firmware-version: 22.32.2004 (MT_0000000359)
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

No. When I found it hard to enable switchdev in this CX6DX, I turned to other machines. This machine is recovered to original state, with a 5.19.0 linux kernel built from source. This 5.19.0 kernel has a built-in mlx5_core module, so I can not install mlx5 kernel driver of OFED software. The built-in kernel does not even provide sysfs interface, let alone enabling swtichdev mode.

The 6.1.67 linux kernel is built from source. Maybe the kernel build config has some incompatibility with mlx5 driver in OFED.

I have changed to the newest firmware from Dell (Network_Firmware_J6D71_LN_22.38.10.02_01.BIN) and it was solved.