Hi,
when i try to start a VM with vGPU i only get the following error (Its currently the only VM on this system):
Enabling VFs on 0000:81:00.0\x0A ]; stderr = [ /usr/lib/nvidia/sriov-manage: line 205: echo: write error: Cannot allocate memory\x0A
The HW-Config is this:
Dell R7525 (BIOS 2.14.1 IOMMU Enabled)
2x AMD EPYC 7443
1 TB Memory
2x Nvidia L40 (vGPU Profile L40-6Q; GPU Mode: Compute)
Citrix XenServer 8.2 with latest Patches
VM:
4 Cores
32GB
200 GB Disk
1 NVIDIA L40-6Q
nvidia-smi output:
==============NVSMI LOG==============
Timestamp : Thu Jun 6 12:02:09 2024
Driver Version : 550.54.16
CUDA Version : Not Found
vGPU Driver Capability
Heterogenous Multi-vGPU : Not supported
Attached GPUs : 2
GPU 00000000:21:00.0
Product Name : NVIDIA L40
Product Brand : NVIDIA
Product Architecture : Ada Lovelace
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : N/A
vGPU Device Capability
Fractional Multi-vGPU : Supported
Heterogeneous Time-Slice Profiles : Not Supported
Heterogeneous Time-Slice Sizes : Not Supported
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1655023051404
GPU UUID : GPU-21fc5b65-accc-b6bd-8c86-16225ecdf91d
Minor Number : 0
VBIOS Version : 95.02.5D.00.01
MultiGPU Board : No
Board ID : 0x2100
Board Part Number : 900-2G133-0110-031
GPU Part Number : 26B5-895-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G133.0250.00.01
OEM Object : 2.1
ECC Object : 6.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU C2C Mode : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : SR-IOV
vGPU Heterogeneous Mode : Disabled
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
GSP Firmware Version : 550.54.16
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x21
Device : 0x00
Domain : 0x0000
Base Classcode : 0x3
Sub Classcode : 0x2
Device Id : 0x26B510DE
Bus Id : 00000000:21:00.0
Sub System Id : 0x169D10DE
GPU Link Info
PCIe Generation
Max : 4
Current : 1
Device Current : 1
Device Max : 4
Host Max : N/A
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P8
Clocks Event Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
Sparse Operation Mode : N/A
FB Memory Usage
Total : 46068 MiB
Reserved : 1130 MiB
Used : 0 MiB
Free : 44937 MiB
BAR1 Memory Usage
Total : 65536 MiB
Used : 1 MiB
Free : 65535 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable Parity : 0
SRAM Uncorrectable SEC-DED : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable Parity : 0
SRAM Uncorrectable SEC-DED : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
SRAM Threshold Exceeded : No
Aggregate Uncorrectable SRAM Sources
SRAM L2 : 0
SRAM SM : 0
SRAM Microcontroller : 0
SRAM PCIE : 0
SRAM Other : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 192 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 29 C
GPU T.Limit Temp : 58 C
GPU Shutdown T.Limit Temp : -5 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating T.Limit Temp : N/A
GPU Power Readings
Power Draw : 41.17 W
Current Power Limit : 300.00 W
Requested Power Limit : 300.00 W
Default Power Limit : 300.00 W
Min Power Limit : 100.00 W
Max Power Limit : 300.00 W
GPU Memory Power Readings
Power Draw : N/A
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 1185 MHz
Applications Clocks
Graphics : 2490 MHz
Memory : 9001 MHz
Default Applications Clocks
Graphics : 2490 MHz
Memory : 9001 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2490 MHz
SM : 2490 MHz
Memory : 9001 MHz
Video : 1935 MHz
Max Customer Boost Clocks
Graphics : 2490 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 930.000 mV
Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health
Bandwidth : N/A
Processes : None
Thank you all for any ideas on this Topic