Unable to Detect NVIDIA GPU with VirtualGL in EKS Cluster

mtst159 · May 1, 2024, 8:52am

I am running an EKS (Amazon Elastic Kubernetes Service) cluster with g4dn.xlarge instances that include a Tesla T4 GPU. Within my container environment, I have installed the NVIDIA GPU operator and confirmed that the GPU (nvidia-smi) is recognized and functional. However, when attempting to use VirtualGL (vglrun) to run graphical applications such as Firefox with GPU acceleration, I encounter the following error:

vglrun -d "/dev/nvidia0" firefox
[GFX1-]: glxtest: ManageChildProcess failed
[GFX1-]: No GPUs detected via PCI

Details:

EKS Instance Type: g4dn.xlarge
GPU: Tesla T4
NVIDIA gpu-operator helm chart: `v23.9.2
Container Environment: Kubernetes with NVIDIA GPU operator
Command Used: vglrun -d "/dev/nvidia0" firefox

Issue:

VirtualGL (vglrun) fails to detect GPUs via PCI when launching applications like Firefox, preventing GPU acceleration.

Questions:

How can I troubleshoot and resolve the issue of VirtualGL not detecting GPUs within my container environment?
Are there additional configurations or dependencies required to enable GPU acceleration with VirtualGL on EKS using the NVIDIA GPU operator?

Additional Information:

Output of nvidia-smi within the container confirms GPU presence and functionality.
@ubuntu-fk5a8-a3b8006etmtlb:~$ nvidia-smi
Wed May 1 08:45:59 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 25C P8 14W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+

Any insights or recommendations for setting up VirtualGL with Kubernetes and NVIDIA GPU operator would be greatly appreciated.

Thank you for your assistance!

Topic		Replies	Views
GPU hardware detected but unable to start (error code 10) NVIDIA Virtual GPU Technology	2	28319	June 1, 2015
G4dn.xlarge with NVIDIA RTX Virtual Workstation (vWS) NVIDIA Virtual GPU Drivers	0	866	June 15, 2022
One K260Q vGPU working -> vmiop_log: error: /usr/lib/libnvidia-vgx.so NVIDIA Virtual GPU Technology	0	6672	May 1, 2014
Installation of CUDA / on RHEL 6 with TurboVNC and VirtualGL CUDA Setup and Installation	3	5992	May 12, 2015
Open GL version shows up as 1.1 on AWS Windows Server 2012 with NVIDIA GRID GPU Driver NVIDIA Virtual GPU Drivers	1	7648	December 22, 2014
NVIDIA GRID K2 for KVM driver Linux	8	4254	May 4, 2018
Nvidia Grid 1 vGPU Driver for Ubuntu Guest OS NVIDIA Virtual GPU Technology	3	13180	March 29, 2016
GPU with VMware, nvidia-smi: No devices were found CUDA Setup and Installation	2	4592	August 30, 2018
Is it possible to virtualize GPU on Jetson agx xavier with k8s cluster Jetson AGX Xavier docker	2	635	August 25, 2022
Vulkan on Ubuntu 16.04 NVIDIA Virtual GPU Drivers	1	3679	April 25, 2019

Unable to Detect NVIDIA GPU with VirtualGL in EKS Cluster

Related topics