Hello,
I am using MCX515A-CCAT NIC in an AMD-based system with X570-A PRO motherboard.
I want to use 64 VFs and enable them by writing that number to a system file:
echo 64 > /sys/class/net/eth1/device/mlx5_num_vfs
The card is plugged into 0a:00.0 PCIe slot.
In dmsg, I found that only 7 VFs were successfully enabled, and they were assigned slots from 0a:00.1 to 0a:00.7. When mlx5_core module tries to enable VF number 8, it fails with the next error:
mlx5_core 0000:0a:01.0: firmware version: 16.26.1040
mlx5_core 0000:0a:01.0: wait_func:1033:(pid 2261): ENABLE_HCA(0x104) timeout. Will > cause a leak of a command resource
mlx5_core 0000:0a:01.0: mlx5_function_setup:1266:(pid 2261): enable hca failed
mlx5_core 0000:0a:01.0: init_one:2116:(pid 2261): mlx5_load_one failed with error code -110
mlx5_core: probe of 0000:0a:01.0 failed with error -110
This happens both when I ran with pci=assign-busses kernel boot option and without it.
System information:
CPU
AMD Ryzen 9 3900X
Kernel
5.0.0-37-generic
OS
Ubuntu 18.04.3 LTS
Kernel Module
mlx5_core, version 4.7-1.0.0
BIOS
Vendor: American Megatrends Inc.
Version: H.60
BIOS Revision: 5.14
lspci output:
0a:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 61
Region 0: Memory at 1020000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at fce00000 [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [48] Vital Product Data
Product Name: CX515A - ConnectX-5 QSFP28
Read-only fields:
[PN] Part number: MCX515A-CCAT
[EC] Engineering changes: AA
[V2] Vendor specific: MCX515A-CCAT
[SN] Serial number: MT1936J02431
[V3] Vendor specific: f207701f11cfe9118000b8599fcc3578
[VA] Vendor specific: MLX:MODL=CX515A:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0
[V0] Vendor specific: PCIeGen3 x16
[RV] Reserved: checksum good, 2 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 04, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy-
IOVSta: Migration-
Initial VFs: 127, Total VFs: 127, Number of VFs: 1, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: 1018
Supported Page Size: 000007ff, System Page Size: 00000001
Region 0: Memory at 0000001022000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1c0 v1] #19
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
Would it possible to use more than 7 VFs on my system?