Hi Team,
I’m trying to use Nvidia A6000 in VMware 7.0u2 here. Now I met a issue that VM with vGPU profile cannot start. Anyone can help please? thanks!
Error when starting VM with vGPU profile
Key: haTask-4-vim.VirtualMachine.powerOn-191
Description: Power On this virtual machine
State: Failed - Could not initialize plugin 'libnvidia-vgx.so' for vGPU 'nvidia_rtxa6000-4q'.
Errors:
* Could not initialize plugin 'libnvidia-vgx.so' for vGPU 'nvidia_rtxa6000-4q'.
* Module 'DevicePowerOn' power on failed.
* Failed to start the virtual machine.
Environment
Server: Supermicro SYS-7049-GP-TRT
ESXi: 7.0u2, build 17867351
vCenter 7.0.2 build 18356314
nvidia-smi output
Timestamp : Fri Aug 27 14:22:13 2021
Driver Version : 470.63
CUDA Version : Not Found
Attached GPUs : 1
GPU 00000000:18:00.0
Product Name : NVIDIA RTX A6000
Product Brand : NVIDIA RTX
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : xxxxxxxxx
GPU UUID : GPU-78fea16a-3767-9d77-64db-f6424ee3a417
Minor Number : 0
VBIOS Version : 94.02.5C.00.02
MultiGPU Board : No
Board ID : 0x1800
GPU Part Number : 900-5G133-2200-000
Module ID : 0
Inforom Version
Image Version : G133.0500.00.05
OEM Object : 2.0
ECC Object : 6.16
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : SR-IOV
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x18
Device : 0x00
Domain : 0x0000
Device Id : 0x223010DE
Bus Id : 00000000:18:00.0
Sub System Id : 0x145910DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 30 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 48685 MiB
Used : 0 MiB
Free : 48685 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 1 MiB
Free : 255 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 192 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 26 C
GPU Shutdown Temp : 98 C
GPU Slowdown Temp : 95 C
GPU Max Operating Temp : 93 C
GPU Target Temperature : 84 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 40.72 W
Power Limit : 300.00 W
Default Power Limit : 300.00 W
Enforced Power Limit : 300.00 W
Min Power Limit : 100.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1800 MHz
Memory : 8001 MHz
Default Applications Clocks
Graphics : 1800 MHz
Memory : 8001 MHz
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 8001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 756.250 mV
Processes : None
vmware.log file
2021-08-27T14:16:46.992Z| vmx| | I005: VMIOP: config /usr/share/nvidia/vgx/nvidia_rtxa6000-4q.conf
2021-08-27T14:16:46.994Z| vmx| | I005: VMIOP: Using VER_4_0 symbol /usr/lib64/vmware/plugin/libnvidia-vgx.so:vmiop_display_vmiop_plugin
2021-08-27T14:16:47.013Z| vmx| | I005: VMIOP: Registered device 0000:18:00.0
2021-08-27T14:16:47.022Z| vmx| | A000: ConfigDB: Setting pciPassthru0.pgpu = "223014590606060606060600000002"
2021-08-27T14:16:47.022Z| vmx| | I005: VMIOP: Enabling checkpoint support
2021-08-27T14:16:47.022Z| vmx| | I005: VMIOP: Initializing plugin vmiop-display
2021-08-27T14:16:47.023Z| vmx| | E002: vmiop_log: NVOS status 0x17
2021-08-27T14:16:47.023Z| vmx| | E002: vmiop_log: Assertion Failed at 0xc22f44b3:97
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: 16 frames returned by backtrace
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /usr/lib64/vmware/plugin/libnvidia-vgx.so(_nv005327vgpu+0x35) [0x98c233c615]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /usr/lib64/vmware/plugin/libnvidia-vgx.so(+0x7f6f8) [0x98c22f86f8]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /usr/lib64/vmware/plugin/libnvidia-vgx.so(+0x7b4b3) [0x98c22f44b3]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /usr/lib64/vmware/plugin/libnvidia-vgx.so(+0x99b97) [0x98c2312b97]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /usr/lib64/vmware/plugin/libnvidia-vgx.so(+0x9caeb) [0x98c2315aeb]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /usr/lib64/vmware/plugin/libvmx-vmiop.so(+0x91f4) [0x98c1e701f4]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x3adf98) [0x987a11df98]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x2dc924) [0x987a04c924]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x2dc45c) [0x987a04c45c]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x2dd557) [0x987a04d557]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x2e82bb) [0x987a0582bb]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x25a0c5) [0x9879fca0c5]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x25a8e2) [0x9879fca8e2]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x24e741) [0x9879fbe741]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /lib64/libc.so.6(__libc_start_main+0xed) [0x98bd420b2d]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: /bin/vmx(+0x24f115) [0x9879fbf115]
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: (0x0): Initialization: Failed to alloc host vgpu device handle error 1
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: (0x0): init_device_instance failed for inst 0 with error 1 (unable to setup host connection state)
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: (0x0): Initialization: init_device_instance failed error 1
2021-08-27T14:16:47.024Z| vmx| | E002: vmiop_log: display_init failed for inst: 0
2021-08-27T14:16:47.024Z| vmx| | E002: VMIOP: Plugin vmiop-display initialization failed: 1
2021-08-27T14:16:47.024Z| vmx| | I005: [msg.vmx.plugin.vmiop.vgpu.failed] Could not initialize plugin 'libnvidia-vgx.so' for vGPU 'nvidia_rtxa6000-4q'.
2021-08-27T14:16:47.024Z| vmx| | I005: Module 'DevicePowerOn' power on failed.
2021-08-27T14:16:47.024Z| vmx| | I005: VMX_PowerOn: ModuleTable_PowerOn = 0
2021-08-27T14:16:47.024Z| vmx| | I005: Device Interface (pciPassthru0) powering off.
2021-08-27T14:16:47.058Z| vmx| | I005: DeviceIfPowerOff: indicating asyncIOThread to exit.
2021-08-27T14:16:47.059Z| svga| | I005: SVGA thread is exiting the main loop
2021-08-27T14:16:47.059Z| vmx| | I005: Destroying virtual dev for scsi0:0 vscsi=8209
2021-08-27T14:16:47.059Z| vmx| | I005: VMMon_VSCSIStopVports: No such target on adapter
2021-08-27T14:16:47.059Z| vmx| | I005: SVMotion_PowerOff: Not running Storage vMotion. Nothing to do
2021-08-27T14:16:47.059Z| vmx| | I005: MKS/SVGA threads are stopped
2021-08-27T14:16:47.059Z| mks| | I005: MKS-RenderMain: Stopped MKSBasicOps
2021-08-27T14:16:47.060Z| mks| | I005: MKS PowerOff
2021-08-27T14:16:47.060Z| svga| | I005: SVGA thread is exiting
2021-08-27T14:16:47.060Z| mks| | I005: MKS thread is exiting
2021-08-27T14:16:47.060Z| vmx| | W003:
2021-08-27T14:16:47.060Z| vmx| | I005: scsi0:0: numIOs = 0 numMergedIOs = 0 numSplitIOs = 0 ( 0.0%)
2021-08-27T14:16:47.060Z| vmx| | I005: Closing disk 'scsi0:0'
2021-08-27T14:16:47.060Z| vmx| | I005: DISKLIB-VMFS : "/vmfs/volumes/61284f28-b5f5a1e8-ea19-ac1f6ba1bf98/vgpu-01/vgpu-01-flat.vmdk" : closed.
2021-08-27T14:16:47.061Z| vmx| | I005: Vix: [mainDispatch.c:1164]: VMAutomationPowerOff: Powering off.
2021-08-27T14:16:47.062Z| vmx| | W003: /vmfs/volumes/61284f28-b5f5a1e8-ea19-ac1f6ba1bf98/vgpu-01/vgpu-01.vmx: Cannot remove symlink /var/run/vmware/0/517531320_2101621/configFile: No such file or directory
2021-08-27T14:16:47.062Z| vmx| | I005: WORKER: asyncOps=1 maxActiveOps=1 maxPending=1 maxCompleted=0
2021-08-27T14:16:47.122Z| vmx| | I005: Vix: [mainDispatch.c:4205]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0
2021-08-27T14:16:47.122Z| vmx| | I005:
2021-08-27T14:16:47.122Z| vmx| | I005+ Power on failure messages: Could not initialize plugin 'libnvidia-vgx.so' for vGPU 'nvidia_rtxa6000-4q'.
2021-08-27T14:16:47.122Z| vmx| | I005+ Module 'DevicePowerOn' power on failed.
2021-08-27T14:16:47.122Z| vmx| | I005+ Failed to start the virtual machine.
2021-08-27T14:16:47.122Z| vmx| | I005+
2021-08-27T14:16:47.122Z| vmx| | I005: Vix: [mainDispatch.c:4205]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
2021-08-27T14:16:47.122Z| vmx| | I005: Transitioned vmx/execState/val to poweredOff
2021-08-27T14:16:47.122Z| vmx| | I005: Vix: [mainDispatch.c:4205]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0
2021-08-27T14:16:47.122Z| vmx| | I005: Vix: [mainDispatch.c:4243]: Error VIX_E_FAIL in VMAutomation_ReportPowerOpFinished(): Unknown error
2021-08-27T14:16:47.122Z| vmx| | I005: Vix: [mainDispatch.c:4205]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
2021-08-27T14:16:47.122Z| vmx| | I005: Transitioned vmx/execState/val to poweredOff
2021-08-27T14:16:47.127Z| vmx| | I005: Vix: [mainDispatch.c:815]: VMAutomation_LateShutdown()
2021-08-27T14:16:47.127Z| vmx| | I005: Vix: [mainDispatch.c:770]: VMAutomationCloseListenerSocket. Closing listener socket.
2021-08-27T14:16:47.128Z| vmx| | I005: Flushing VMX VMDB connections
2021-08-27T14:16:47.128Z| vmx| | I005: VigorTransport_ServerCloseClient: Closing transport 987B73F5F0 (err = 0)
2021-08-27T14:16:47.128Z| vmx| | I005: VigorTransport_ServerDestroy: server destroyed.
2021-08-27T14:16:47.128Z| vmx| | I005: VMX exit (0).
2021-08-27T14:16:47.129Z| vmx| | I005: OBJLIB-LIB: ObjLib cleanup done.
2021-08-27T14:16:47.129Z| vmx| | I005: AIOMGR-S : stat o=3 r=16 w=0 i=0 br=98416 bw=0
2021-08-27T14:16:47.129Z| vmx| | W003: VMX has left the building: 0.
If you need more information, please let me know.
Best reagrds.
Kaka