SR-IOV VF Creation Inside VM After Passing ConnectX-7 PF via VFIO-PCI

Environment

NIC: NVIDIA ConnectX-7 (MT2910 Family)
Mode: Baremetal host running KVM
Use case: NUMA isolation – KVM VM will act as Kubernetes worker node
K8s components: SR-IOV Network Device Plugin / NVIDIA Network Operator

Current Working Scenario (Baremetal)

On baremetal Kubernetes worker nodes:

  • SR-IOV Network Device Plugin works correctly.
  • VFs created on ConnectX-7 PFs are visible.
  • PF–VF association is intact.
  • Dynamic VF allocation works as expected.

Baremetal PFs:

pci@0000:a1:00.0  enp161s0f0np0  network  MT2910 Family [ConnectX-7]
pci@0000:a1:00.1  enp161s0f1np1  network  MT2910 Family [ConnectX-7]

New Requirement

For NUMA isolation:

  • KVM VM should act as Kubernetes worker node.
  • PF (enp161s0f0np0) is passed to VM using PCI passthrough (vfio-pci).
  • Inside VM, SR-IOV plugin should dynamically create and manage VFs.

So effectively:


BM PF → passed via VFIO-PCI → appears as PF inside VM → create VFs inside VM → use SR-IOV plugin inside VM.

VM Log: PF enp161s0f0np0 is now listed as enp3s0np0 in VM

mlxconfig -d /dev/mst/mt4129_pciconf0.1. q | grep -i -E 'NUM_OF_VFS'
NUM_OF_VFS 12

mlxconfig -d /dev/mst/mt4129_pciconf0.1. q | grep -i -E 'SRIOV'
SRIOV_EN True(1)

cat /sys/class/net/enp3s0np0/device/mlx_num_vfs
0

but /sys/class/net/enp3s0np0/device/sriov_numvfs not present

echo 12 > /sys/class/net/enp3s0np0/device/sriov_numvfs 
getting error : -bash: /sys/class/net/ens11f0np1/device/sriov_numvfs: permission denied.

I understand once PF is moved to VM from BM, PF association with BM is lost. Do we have the support now for the above configuration??

Questions

  • Is SR-IOV VF creation supported inside a VM when the PF is passed via VFIO-PCI?
  • Is additional firmware configuration required for ConnectX-7 to allow nested SR-IOV?
  • Is this a limitation of:
    mlx5 driver behavior under PCI passthrough?
    Or KVM IOMMU configuration?

What is the supported method to run SR-IOV Network Operator inside a VM using a passed-through PF?

Thank you for your support!

Reference:

Hello~

It is helpful.

  1. Is VF creation inside the VM supported when the PF is passed via VFIO-PCI?

No. VF creation is only supported on the host, with the PF bound to the native driver (e.g. mlx5_core). When the PF is bound to vfio-pci for passthrough, the kernel does not expose SR-IOV configuration via sysfs (e.g. “Driver does not support SRIOV configuration via sysfs”), so creating VFs in the VM is not supported.

  1. Firmware / nested SR-IOV?

No extra ConnectX-7 firmware is required for “nested” SR-IOV; nested SR-IOV (create VFs inside a VM from a passed-through PF) is not a supported configuration.

  1. mlx5 vs KVM/IOMMU?

This is a driver/passthrough model limitation (vfio-pci does not support SR-IOV sysfs), not an mlx5 bug or a KVM/IOMMU misconfiguration.

  1. Supported way to run SR-IOV Network Operator with a VM?
  • Create VFs on the bare-metal host (PF on host with mlx5/MLNX_OFED, use sriov_numvfs on the host).

  • Pass VFs (not the PF) into the VM as PCI passthrough devices.

  • For “dynamic” behavior, run the SR-IOV/Network Operator on the host and assign the created VFs to VMs.

  • For NUMA isolation: pin the VM to a NUMA node and assign VFs from the PF on that same node.

/HyungKwang

Hi, Thank you for the detailed explanation. Much appreciated ! I will keep VF creation & SRIOV-Plugin implementation with BM level.