A2 vGPU setup failes - Mode Selector required?

Hello,

I have a HPE DL325 Gen10 Plus with an EPYC CPU and an A2 card.
I ran proxmox 8.2 with Kernel 6.8 and managed to install NVIDIA-Linux-x86_64-550.54.16-vgpu-kvm and get output for nvidia-smi:

Tue Jun  4 16:24:38 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.16              Driver Version: 550.54.16      CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A2                      On  |   00000000:86:00.0 Off |                    0 |
|  0%   30C    P8              8W /   60W |       0MiB /  15356MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I have SR-IOv enabled in BIOS and all other thing related to IOMMU:

xxxx:~# dmesg | grep -e IOMMU -e AMD-Vi
[    0.385675] AMD-Vi: Unknown option - 'on'
[    1.397545] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160
[    1.397547] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160
[    1.397548] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160
[    1.397549] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160
[    1.397717] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160
[    1.397718] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160
[    1.397719] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160
[    1.397720] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160
[    1.397889] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160
[    1.397890] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160
[    1.397891] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160
[    1.397892] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160
[    1.398066] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160
[    1.398069] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160
[    1.398070] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160
[    1.398071] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160
[    1.398072] AMD-Vi: Using global IVHD EFR:0x59f77efa2094ade, EFR2:0x0
[    1.746362] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
[    1.747777] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
[    1.748910] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[    1.750182] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    1.752281] AMD-Vi: Extended features (0x59f77efa2094ade, 0x0): PPR X2APIC NX GT IA GA PC
[    1.752292] AMD-Vi: Interrupt remapping enabled
[    1.752293] AMD-Vi: X2APIC enabled
[    1.758017] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    1.758028] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[    1.758037] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[    1.758050] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank)

I followed the guide for PCIe passthrough from proxmox and nvidia (regarding KVM) and disabled nouveau, loaded all required kernel modules:

xxxxxxx:~# lsmod | grep nvidia
nvidia_vgpu_vfio       98304  0
nvidia              54112256  5
mdev                   24576  1 nvidia_vgpu_vfio
kvm                  1372160  2 kvm_amd,nvidia_vgpu_vfio
vfio_pci_core          86016  2 nvidia_vgpu_vfio,vfio_pci
irqbypass              12288  3 vfio_pci_core,nvidia_vgpu_vfio,kvm
vfio                   69632  4 vfio_pci_core,nvidia_vgpu_vfio,vfio_iommu_type1,vfio_pci

nvidia-smi -q gives me the following output. The BAR1 size looks ok i think, but the display mode is active? Is that correct?

xxxxxxxxx:~# nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Tue Jun  4 16:28:20 2024
Driver Version                            : 550.54.16
CUDA Version                              : Not Found
vGPU Driver Capability
        Heterogenous Multi-vGPU           : Supported

Attached GPUs                             : 1
GPU 00000000:86:00.0
    Product Name                          : NVIDIA A2
    Product Brand                         : NVIDIA
    Product Architecture                  : Ampere
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    Addressing Mode                       : N/A
    vGPU Device Capability
        Fractional Multi-vGPU             : Supported
        Heterogeneous Time-Slice Profiles : Supported
        Heterogeneous Time-Slice Sizes    : Supported
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Enabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1321522057169
    GPU UUID                              : GPU-b2e30577-8aa3-9bc8-aab9-938471934f24
    Minor Number                          : 0
    VBIOS Version                         : 94.07.5B.00.92
    MultiGPU Board                        : No
    Board ID                              : 0x8600
    Board Part Number                     : 900-2G179-0320-100
    GPU Part Number                       : 25B6-890-A1
    FRU Part Number                       : N/A
    Module ID                             : 1
    Inforom Version
        Image Version                     : G179.0220.00.01
        OEM Object                        : 2.0
        ECC Object                        : 6.16
        Power Management Object           : N/A
    Inforom BBX Object Flush
        Latest Timestamp                  : N/A
        Latest Duration                   : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU C2C Mode                          : N/A
    GPU Virtualization Mode
        Virtualization Mode               : Host VGPU
        Host VGPU Mode                    : SR-IOV
        vGPU Heterogeneous Mode           : Disabled
    GPU Reset Status
        Reset Required                    : No
        Drain and Reset Recommended       : N/A
    GSP Firmware Version                  : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x86
        Device                            : 0x00
        Domain                            : 0x0000
        Base Classcode                    : 0x3
        Sub Classcode                     : 0x2
        Device Id                         : 0x25B610DE
        Bus Id                            : 00000000:86:00.0
        Sub System Id                     : 0x157E10DE
        GPU Link Info
            PCIe Generation
                Max                       : 4
                Current                   : 1
                Device Current            : 1
                Device Max                : 4
                Host Max                  : N/A
            Link Width
                Max                       : 16x
                Current                   : 8x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : 0 %
    Performance State                     : P8
    Clocks Event Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    Sparse Operation Mode                 : N/A
    FB Memory Usage
        Total                             : 15356 MiB
        Reserved                          : 265 MiB
        Used                              : 0 MiB
        Free                              : 15090 MiB
    BAR1 Memory Usage
        Total                             : 16384 MiB
        Used                              : 1 MiB
        Free                              : 16383 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : 0 %
        OFA                               : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            SRAM Correctable              : 0
            SRAM Uncorrectable Parity     : 0
            SRAM Uncorrectable SEC-DED    : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
        Aggregate
            SRAM Correctable              : 0
            SRAM Uncorrectable Parity     : 0
            SRAM Uncorrectable SEC-DED    : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
            SRAM Threshold Exceeded       : No
        Aggregate Uncorrectable SRAM Sources
            SRAM L2                       : 0
            SRAM SM                       : 0
            SRAM Microcontroller          : 0
            SRAM PCIE                     : 0
            SRAM Other                    : 0
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 64 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 31 C
        GPU T.Limit Temp                  : N/A
        GPU Shutdown Temp                 : 96 C
        GPU Slowdown Temp                 : 93 C
        GPU Max Operating Temp            : 86 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    GPU Power Readings
        Power Draw                        : 8.58 W
        Current Power Limit               : 60.00 W
        Requested Power Limit             : 60.00 W
        Default Power Limit               : 60.00 W
        Min Power Limit                   : 35.00 W
        Max Power Limit                   : 60.00 W
    GPU Memory Power Readings 
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 795 MHz
    Applications Clocks
        Graphics                          : 1770 MHz
        Memory                            : 6251 MHz
    Default Applications Clocks
        Graphics                          : 1770 MHz
        Memory                            : 6251 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1770 MHz
        SM                                : 1770 MHz
        Memory                            : 6251 MHz
        Video                             : 1650 MHz
    Max Customer Boost Clocks
        Graphics                          : 1770 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 656.250 mV
    Fabric
        State                             : N/A
        Status                            : N/A
        CliqueId                          : N/A
        ClusterUUID                       : N/A
        Health
            Bandwidth                     : N/A
    Processes                             : None

When I now ran /usr/lib/nvidia/sriov-manage -e 0000:86:00.0 I get the following error:

Enabling VFs on 0000:86:00.0
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted
/usr/lib/nvidia/sriov-manage: line 134: echo: write error: Operation not permitted

This results in mdevctl types being empty and no mdev folders have been created under /sys/bus/pci/devices/0000:86:00.0/

xxxxxxx:/sys/bus/pci/devices/0000:86:00.0# ls -la
total 0
drwxr-xr-x  7 root root           0 Jun  4 16:23 .
drwxr-xr-x 26 root root           0 Jun  4 16:23 ..
-r--r--r--  1 root root        4096 Jun  4 16:33 aer_dev_correctable
-r--r--r--  1 root root        4096 Jun  4 16:33 aer_dev_fatal
-r--r--r--  1 root root        4096 Jun  4 16:33 aer_dev_nonfatal
-r--r--r--  1 root root        4096 Jun  4 16:33 ari_enabled
-rw-r--r--  1 root root        4096 Jun  4 16:33 broken_parity_status
-r--r--r--  1 root root        4096 Jun  4 16:28 class
-rw-r--r--  1 root root        4096 Jun  4 16:23 config
-r--r--r--  1 root root        4096 Jun  4 16:33 consistent_dma_mask_bits
-r--r--r--  1 root root        4096 Jun  4 16:33 current_link_speed
-r--r--r--  1 root root        4096 Jun  4 16:33 current_link_width
-rw-r--r--  1 root root        4096 Jun  4 16:33 d3cold_allowed
-r--r--r--  1 root root        4096 Jun  4 16:28 device
-r--r--r--  1 root root        4096 Jun  4 16:33 dma_mask_bits
lrwxrwxrwx  1 root root           0 Jun  4 16:33 driver -> ../../../../bus/pci/drivers/nvidia
-rw-r--r--  1 root root        4096 Jun  4 16:33 driver_override
-rw-r--r--  1 root root        4096 Jun  4 16:33 enable
lrwxrwxrwx  1 root root           0 Jun  4 16:33 firmware_node -> ../../../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01/device:28/device:29
drwxr-xr-x  4 root root           0 Jun  4 16:31 i2c-4
drwxr-xr-x  4 root root           0 Jun  4 16:31 i2c-5
lrwxrwxrwx  1 root root           0 Jun  4 16:33 iommu -> ../../0000:80:00.2/iommu/ivhd1
lrwxrwxrwx  1 root root           0 Jun  4 16:33 iommu_group -> ../../../../kernel/iommu_groups/26
-r--r--r--  1 root root        4096 Jun  4 16:33 irq
drwxr-xr-x  2 root root           0 Jun  4 16:33 link
-r--r--r--  1 root root        4096 Jun  4 16:33 local_cpulist
-r--r--r--  1 root root        4096 Jun  4 16:33 local_cpus
-r--r--r--  1 root root        4096 Jun  4 16:33 max_link_speed
-r--r--r--  1 root root        4096 Jun  4 16:33 max_link_width
-r--r--r--  1 root root        4096 Jun  4 16:33 modalias
-rw-r--r--  1 root root        4096 Jun  4 16:33 msi_bus
drwxr-xr-x  2 root root           0 Jun  4 16:33 msi_irqs
-rw-r--r--  1 root root        4096 Jun  4 16:33 numa_node
drwxr-xr-x  2 root root           0 Jun  4 16:33 power
-r--r--r--  1 root root        4096 Jun  4 16:33 power_state
--w--w----  1 root root        4096 Jun  4 16:33 remove
--w-------  1 root root        4096 Jun  4 16:33 rescan
--w-------  1 root root        4096 Jun  4 16:33 reset
-rw-r--r--  1 root root        4096 Jun  4 16:33 reset_method
-r--r--r--  1 root root        4096 Jun  4 16:28 resource
-rw-------  1 root root    16777216 Jun  4 16:33 resource0
-rw-r--r--  1 root root        4096 Jun  4 16:33 resource0_resize
-rw-------  1 root root 17179869184 Jun  4 16:33 resource1
-rw-r--r--  1 root root        4096 Jun  4 16:33 resource1_resize
-rw-------  1 root root 17179869184 Jun  4 16:33 resource1_wc
-rw-------  1 root root    33554432 Jun  4 16:33 resource3
-rw-r--r--  1 root root        4096 Jun  4 16:33 resource3_resize
-rw-------  1 root root    33554432 Jun  4 16:33 resource3_wc
-r--r--r--  1 root root        4096 Jun  4 16:28 revision
-rw-r--r--  1 root root        4096 Jun  4 16:33 sriov_drivers_autoprobe
-rw-r--r--  1 root root        4096 Jun  4 16:31 sriov_numvfs
-r--r--r--  1 root root        4096 Jun  4 16:33 sriov_offset
-r--r--r--  1 root root        4096 Jun  4 16:33 sriov_stride
-r--r--r--  1 root root        4096 Jun  4 16:31 sriov_totalvfs
-r--r--r--  1 root root        4096 Jun  4 16:33 sriov_vf_device
-r--r--r--  1 root root        4096 Jun  4 16:33 sriov_vf_total_msix
lrwxrwxrwx  1 root root           0 Jun  4 16:23 subsystem -> ../../../../bus/pci
-r--r--r--  1 root root        4096 Jun  4 16:28 subsystem_device
-r--r--r--  1 root root        4096 Jun  4 16:28 subsystem_vendor
-rw-r--r--  1 root root        4096 Jun  4 16:23 uevent
-r--r--r--  1 root root        4096 Jun  4 16:23 vendor
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn0 -> ../0000:86:00.4
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn1 -> ../0000:86:00.5
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn10 -> ../0000:86:01.6
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn11 -> ../0000:86:01.7
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn12 -> ../0000:86:02.0
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn13 -> ../0000:86:02.1
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn14 -> ../0000:86:02.2
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn15 -> ../0000:86:02.3
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn2 -> ../0000:86:00.6
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn3 -> ../0000:86:00.7
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn4 -> ../0000:86:01.0
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn5 -> ../0000:86:01.1
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn6 -> ../0000:86:01.2
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn7 -> ../0000:86:01.3
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn8 -> ../0000:86:01.4
lrwxrwxrwx  1 root root           0 Jun  4 16:33 virtfn9 -> ../0000:86:01.5

Also dmesg gives me the following output after sriov-manage:

[  456.916640] NVRM: GPU 0000:86:00.0: UnbindLock acquired
[  457.055662] pci-pf-stub 0000:86:00.0: claimed by pci-pf-stub
[  457.664526] pci 0000:86:00.4: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.664544] pci 0000:86:00.4: enabling Extended Tags
[  457.664581] pci 0000:86:00.4: Enabling HDA controller
[  457.664899] pci 0000:86:00.4: Adding to iommu group 67
[  457.665159] NVRM: Aborting probe for VF 0000:86:00.4 since PF is not bound to nvidia driver.
[  457.665200] nvidia: probe of 0000:86:00.4 failed with error -1
[  457.665238] pci-pf-stub 0000:86:00.4: claimed by pci-pf-stub
[  457.665296] pci 0000:86:00.5: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.665311] pci 0000:86:00.5: enabling Extended Tags
[  457.665336] pci 0000:86:00.5: Enabling HDA controller
[  457.665653] pci 0000:86:00.5: Adding to iommu group 68
[  457.665785] NVRM: Aborting probe for VF 0000:86:00.5 since PF is not bound to nvidia driver.
[  457.665806] nvidia: probe of 0000:86:00.5 failed with error -1
[  457.665833] pci-pf-stub 0000:86:00.5: claimed by pci-pf-stub
[  457.665880] pci 0000:86:00.6: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.665893] pci 0000:86:00.6: enabling Extended Tags
[  457.665916] pci 0000:86:00.6: Enabling HDA controller
[  457.666186] pci 0000:86:00.6: Adding to iommu group 69
[  457.666341] NVRM: Aborting probe for VF 0000:86:00.6 since PF is not bound to nvidia driver.
[  457.666363] nvidia: probe of 0000:86:00.6 failed with error -1
[  457.666389] pci-pf-stub 0000:86:00.6: claimed by pci-pf-stub
[  457.666431] pci 0000:86:00.7: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.666444] pci 0000:86:00.7: enabling Extended Tags
[  457.666476] pci 0000:86:00.7: Enabling HDA controller
[  457.666728] pci 0000:86:00.7: Adding to iommu group 70
[  457.667198] NVRM: Aborting probe for VF 0000:86:00.7 since PF is not bound to nvidia driver.
[  457.667215] nvidia: probe of 0000:86:00.7 failed with error -1
[  457.667237] pci-pf-stub 0000:86:00.7: claimed by pci-pf-stub
[  457.667273] pci 0000:86:01.0: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.667284] pci 0000:86:01.0: enabling Extended Tags
[  457.667302] pci 0000:86:01.0: Enabling HDA controller
[  457.667511] pci 0000:86:01.0: Adding to iommu group 71
[  457.667939] NVRM: Aborting probe for VF 0000:86:01.0 since PF is not bound to nvidia driver.
[  457.667955] nvidia: probe of 0000:86:01.0 failed with error -1
[  457.667977] pci-pf-stub 0000:86:01.0: claimed by pci-pf-stub
[  457.668010] pci 0000:86:01.1: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.668022] pci 0000:86:01.1: enabling Extended Tags
[  457.668039] pci 0000:86:01.1: Enabling HDA controller
[  457.668211] pci 0000:86:01.1: Adding to iommu group 72
[  457.668271] NVRM: Aborting probe for VF 0000:86:01.1 since PF is not bound to nvidia driver.
[  457.668280] nvidia: probe of 0000:86:01.1 failed with error -1
[  457.668314] pci-pf-stub 0000:86:01.1: claimed by pci-pf-stub
[  457.668346] pci 0000:86:01.2: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.668357] pci 0000:86:01.2: enabling Extended Tags
[  457.668374] pci 0000:86:01.2: Enabling HDA controller
[  457.668556] pci 0000:86:01.2: Adding to iommu group 73
[  457.668630] NVRM: Aborting probe for VF 0000:86:01.2 since PF is not bound to nvidia driver.
[  457.668642] nvidia: probe of 0000:86:01.2 failed with error -1
[  457.668659] pci-pf-stub 0000:86:01.2: claimed by pci-pf-stub
[  457.668691] pci 0000:86:01.3: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.668702] pci 0000:86:01.3: enabling Extended Tags
[  457.668719] pci 0000:86:01.3: Enabling HDA controller
[  457.668916] pci 0000:86:01.3: Adding to iommu group 74
[  457.668980] NVRM: Aborting probe for VF 0000:86:01.3 since PF is not bound to nvidia driver.
[  457.668995] nvidia: probe of 0000:86:01.3 failed with error -1
[  457.669012] pci-pf-stub 0000:86:01.3: claimed by pci-pf-stub
[  457.669040] pci 0000:86:01.4: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.669051] pci 0000:86:01.4: enabling Extended Tags
[  457.669068] pci 0000:86:01.4: Enabling HDA controller
[  457.669209] pci 0000:86:01.4: Adding to iommu group 75
[  457.669267] NVRM: Aborting probe for VF 0000:86:01.4 since PF is not bound to nvidia driver.
[  457.669277] nvidia: probe of 0000:86:01.4 failed with error -1
[  457.669295] pci-pf-stub 0000:86:01.4: claimed by pci-pf-stub
[  457.669323] pci 0000:86:01.5: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.669334] pci 0000:86:01.5: enabling Extended Tags
[  457.669351] pci 0000:86:01.5: Enabling HDA controller
[  457.669523] pci 0000:86:01.5: Adding to iommu group 76
[  457.669583] NVRM: Aborting probe for VF 0000:86:01.5 since PF is not bound to nvidia driver.
[  457.669594] nvidia: probe of 0000:86:01.5 failed with error -1
[  457.669611] pci-pf-stub 0000:86:01.5: claimed by pci-pf-stub
[  457.669641] pci 0000:86:01.6: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.669652] pci 0000:86:01.6: enabling Extended Tags
[  457.669669] pci 0000:86:01.6: Enabling HDA controller
[  457.669882] pci 0000:86:01.6: Adding to iommu group 77
[  457.669941] NVRM: Aborting probe for VF 0000:86:01.6 since PF is not bound to nvidia driver.
[  457.669952] nvidia: probe of 0000:86:01.6 failed with error -1
[  457.669978] pci-pf-stub 0000:86:01.6: claimed by pci-pf-stub
[  457.670007] pci 0000:86:01.7: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.670018] pci 0000:86:01.7: enabling Extended Tags
[  457.670035] pci 0000:86:01.7: Enabling HDA controller
[  457.670179] pci 0000:86:01.7: Adding to iommu group 78
[  457.670249] NVRM: Aborting probe for VF 0000:86:01.7 since PF is not bound to nvidia driver.
[  457.670259] nvidia: probe of 0000:86:01.7 failed with error -1
[  457.670279] pci-pf-stub 0000:86:01.7: claimed by pci-pf-stub
[  457.670308] pci 0000:86:02.0: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.670319] pci 0000:86:02.0: enabling Extended Tags
[  457.670336] pci 0000:86:02.0: Enabling HDA controller
[  457.670518] pci 0000:86:02.0: Adding to iommu group 79
[  457.670583] NVRM: Aborting probe for VF 0000:86:02.0 since PF is not bound to nvidia driver.
[  457.670596] nvidia: probe of 0000:86:02.0 failed with error -1
[  457.670642] pci-pf-stub 0000:86:02.0: claimed by pci-pf-stub
[  457.670671] pci 0000:86:02.1: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.670683] pci 0000:86:02.1: enabling Extended Tags
[  457.670700] pci 0000:86:02.1: Enabling HDA controller
[  457.670843] pci 0000:86:02.1: Adding to iommu group 80
[  457.670904] NVRM: Aborting probe for VF 0000:86:02.1 since PF is not bound to nvidia driver.
[  457.670914] nvidia: probe of 0000:86:02.1 failed with error -1
[  457.670952] pci-pf-stub 0000:86:02.1: claimed by pci-pf-stub
[  457.670984] pci 0000:86:02.2: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.670996] pci 0000:86:02.2: enabling Extended Tags
[  457.671013] pci 0000:86:02.2: Enabling HDA controller
[  457.671171] pci 0000:86:02.2: Adding to iommu group 81
[  457.671237] NVRM: Aborting probe for VF 0000:86:02.2 since PF is not bound to nvidia driver.
[  457.671247] nvidia: probe of 0000:86:02.2 failed with error -1
[  457.671264] pci-pf-stub 0000:86:02.2: claimed by pci-pf-stub
[  457.671295] pci 0000:86:02.3: [10de:25b6] type 00 class 0x030200 PCIe Endpoint
[  457.671306] pci 0000:86:02.3: enabling Extended Tags
[  457.671323] pci 0000:86:02.3: Enabling HDA controller
[  457.671551] pci 0000:86:02.3: Adding to iommu group 82
[  457.671613] NVRM: Aborting probe for VF 0000:86:02.3 since PF is not bound to nvidia driver.
[  457.671624] nvidia: probe of 0000:86:02.3 failed with error -1
[  457.671653] pci-pf-stub 0000:86:02.3: claimed by pci-pf-stub
[  457.671756] pci-pf-stub 0000:86:00.0: driver left SR-IOV enabled after remove
[  457.725644] NVRM: GPU at 0000:86:00.0 has software scheduler ENABLED with policy BEST_EFFORT.
[  458.770640] NVRM: Aborting probe for VF 0000:86:00.4 since IOMMU is not present on the system.
[  458.770664] nvidia: probe of 0000:86:00.4 failed with error -1
[  458.771211] NVRM: Aborting probe for VF 0000:86:00.5 since IOMMU is not present on the system.
[  458.771234] nvidia: probe of 0000:86:00.5 failed with error -1
[  458.771559] NVRM: Aborting probe for VF 0000:86:00.6 since IOMMU is not present on the system.
[  458.771584] nvidia: probe of 0000:86:00.6 failed with error -1
[  458.771916] NVRM: Aborting probe for VF 0000:86:00.7 since IOMMU is not present on the system.
[  458.771936] nvidia: probe of 0000:86:00.7 failed with error -1
[  458.772270] NVRM: Aborting probe for VF 0000:86:01.0 since IOMMU is not present on the system.
[  458.772290] nvidia: probe of 0000:86:01.0 failed with error -1
[  458.772623] NVRM: Aborting probe for VF 0000:86:01.1 since IOMMU is not present on the system.
[  458.772643] nvidia: probe of 0000:86:01.1 failed with error -1
[  458.772955] NVRM: Aborting probe for VF 0000:86:01.2 since IOMMU is not present on the system.
[  458.772972] nvidia: probe of 0000:86:01.2 failed with error -1
[  458.773257] NVRM: Aborting probe for VF 0000:86:01.3 since IOMMU is not present on the system.
[  458.773272] nvidia: probe of 0000:86:01.3 failed with error -1
[  458.773574] NVRM: Aborting probe for VF 0000:86:01.4 since IOMMU is not present on the system.
[  458.773588] nvidia: probe of 0000:86:01.4 failed with error -1
[  458.773877] NVRM: Aborting probe for VF 0000:86:01.5 since IOMMU is not present on the system.
[  458.773891] nvidia: probe of 0000:86:01.5 failed with error -1
[  458.774174] NVRM: Aborting probe for VF 0000:86:01.6 since IOMMU is not present on the system.
[  458.774188] nvidia: probe of 0000:86:01.6 failed with error -1
[  458.774486] NVRM: Aborting probe for VF 0000:86:01.7 since IOMMU is not present on the system.
[  458.774512] nvidia: probe of 0000:86:01.7 failed with error -1
[  458.774799] NVRM: Aborting probe for VF 0000:86:02.0 since IOMMU is not present on the system.
[  458.774812] nvidia: probe of 0000:86:02.0 failed with error -1
[  458.775107] NVRM: Aborting probe for VF 0000:86:02.1 since IOMMU is not present on the system.
[  458.775121] nvidia: probe of 0000:86:02.1 failed with error -1
[  458.775400] NVRM: Aborting probe for VF 0000:86:02.2 since IOMMU is not present on the system.
[  458.775413] nvidia: probe of 0000:86:02.2 failed with error -1
[  458.775718] NVRM: Aborting probe for VF 0000:86:02.3 since IOMMU is not present on the system.
[  458.775732] nvidia: probe of 0000:86:02.3 failed with error -1

I also performed he initramfs-update and added iommu=pt to the grub commandline. So in principle IOMMU should work as all the iommu groups are created:

xxxxxxxx:~# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
IOMMU group 0 c0:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 0 c0:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 0 c0:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 10 c6:00.0 Ethernet controller [0200]: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller [1077:8070] (rev 02)
IOMMU group 10 c6:00.1 Ethernet controller [0200]: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller [1077:8070] (rev 02)
IOMMU group 11 c2:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 12 c2:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 13 c3:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 14 c3:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 15 80:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 15 80:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 15 80:01.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 16 80:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 17 80:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 18 80:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 19 80:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 1 c0:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 20 80:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 21 80:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 22 80:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 23 80:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 24 80:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 25 80:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 26 86:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 27 82:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 28 82:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 29 81:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 2 c0:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 30 81:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 31 83:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 32 40:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 33 40:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 34 40:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 35 40:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 35 40:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 35 40:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 35 40:03.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 35 40:03.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 36 40:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 37 40:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 38 40:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 39 40:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 3 c0:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 3 c0:03.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 3 c0:03.5 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 40 40:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 41 40:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 42 41:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 43 41:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 44 42:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 45 42:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
IOMMU group 46 42:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 47 42:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller [1022:148c]
IOMMU group 48 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 48 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 48 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 49 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 4 c0:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 50 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 51 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 52 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 53 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 54 00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 55 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 56 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 57 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 58 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 59 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
IOMMU group 59 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU group 5 c0:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 5 c0:05.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship PCIe GPP Bridge [1:0] [1022:149a]
IOMMU group 5 c1:00.0 System peripheral [0880]: Hewlett-Packard Company Integrated Lights-Out Standard Slave Instrumentation & System Support [103c:3306] (rev 07)
IOMMU group 5 c1:00.1 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. MGA G200eH3 [102b:0538] (rev 02)
IOMMU group 5 c1:00.2 System peripheral [0880]: Hewlett-Packard Company Integrated Lights-Out Standard Management Processor Support and Messaging [103c:3307] (rev 07)
IOMMU group 5 c1:00.4 USB controller [0c03]: Hewlett-Packard Company iLO5 Virtual USB Controller [103c:22f6]
IOMMU group 60 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 0 [1022:1650]
IOMMU group 60 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 1 [1022:1651]
IOMMU group 60 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 2 [1022:1652]
IOMMU group 60 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 3 [1022:1653]
IOMMU group 60 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 4 [1022:1654]
IOMMU group 60 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 5 [1022:1655]
IOMMU group 60 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 6 [1022:1656]
IOMMU group 60 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 7 [1022:1657]
IOMMU group 61 06:00.0 Serial Attached SCSI controller [0107]: Adaptec Smart Storage PQI SAS [9005:028f] (rev 01)
IOMMU group 62 01:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 63 01:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 64 02:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 65 02:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA [1022:1498]
IOMMU group 66 02:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller [1022:148c]
IOMMU group 67 86:00.4 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 68 86:00.5 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 69 86:00.6 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 6 c0:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 70 86:00.7 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 71 86:01.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 72 86:01.1 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 73 86:01.2 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 74 86:01.3 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 75 86:01.4 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 76 86:01.5 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 77 86:01.6 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 78 86:01.7 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 79 86:02.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 7 c0:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 80 86:02.1 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 81 86:02.2 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 82 86:02.3 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
IOMMU group 8 c0:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 9 c0:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]

What am I doing wrong here, that I cannot get the mdevs? Do I need the mode selector tool for the A2?

Support is very much appreciated. Thx