Hi folks.
I have an AWS machine with Tesla T4 card and I want to enable vGPUs on this machine. After following the steps as recommended by Amazon to install nvidia drivers and disabling nouveau, when I run diplaymodeselector, it fails.
sudo ./display_mode_change_tool/linux/x64/displaymodeselector --gpumode physical_display_disabled
...
Specified GPU mode not supported on this device 0x1EB8.
The output of
nvidia-smi -q
tool is as follows:
==============NVSMI LOG==============
Timestamp : Wed Mar 6 06:50:55 2024
Driver Version : 550.54.14
CUDA Version : 12.4
Attached GPUs : 1
GPU 00000000:00:1E.0
Product Name : Tesla T4
Product Brand : NVIDIA
Product Architecture : Turing
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
Addressing Mode : None
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1561620020480
GPU UUID : GPU-35a3ab88-ce3f-5092-6676-10992ae344b8
Minor Number : 0
VBIOS Version : 90.04.96.00.02
MultiGPU Board : No
Board ID : 0x1e
Board Part Number : 900-2G183-0000-001
GPU Part Number : 1EB8-895-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G183.0200.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU C2C Mode : N/A
GPU Virtualization Mode
Virtualization Mode : Pass-Through
Host VGPU Mode : N/A
vGPU Heterogeneous Mode : N/A
vGPU Software Licensed Product
Product Name : NVIDIA Virtual Applications
License Status : Licensed (Expiry: N/A)
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
GSP Firmware Version : 550.54.14
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x00
Device : 0x1E
Domain : 0x0000
Base Classcode : 0x3
Sub Classcode : 0x2
Device Id : 0x1EB810DE
Bus Id : 00000000:00:1E.0
Sub System Id : 0x12A210DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Device Current : 3
Device Max : 3
Host Max : N/A
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
Sparse Operation Mode : N/A
FB Memory Usage
Total : 15360 MiB
Reserved : 442 MiB
Used : 0 MiB
Free : 14917 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 31 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
GPU Power Readings
Power Draw : 27.33 W
Current Power Limit : 70.00 W
Requested Power Limit : 70.00 W
Default Power Limit : 70.00 W
Min Power Limit : 60.00 W
Max Power Limit : 70.00 W
GPU Memory Power Readings
Power Draw : N/A
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 585 MHz
SM : 585 MHz
Memory : 5000 MHz
Video : 840 MHz
Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Default Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1590 MHz
SM : 1590 MHz
Memory : 5001 MHz
Video : 1470 MHz
Max Customer Boost Clocks
Graphics : 1590 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health
Bandwidth : N/A
Processes : None
Please suggest how to proceed.
Edit: FYI, the mode is available on the GPU.
$ sudo ./display_mode_change_tool/linux/x64/displaymodeselector --gpumode
NVIDIA Display Mode Selector Utility (Version 1.61.0)
Copyright (C) 2015-2021, NVIDIA Corporation. All Rights Reserved.
WARNING: This operation updates the firmware on the board and could make
the device unusable if your host system lacks the necessary support.
Are you sure you want to continue?
Press 'y' to confirm (any other key to abort):
y
Select a number:
<0> physical_display_enabled_256MB_bar1
<1> physical_display_disabled
<2> physical_display_enabled_8GB_bar1
Select a number (ESC to quit):
^[
ERROR: User aborted