Hello NVIDIA Community,
I successfully got Vulkan working with NVIDIA L4 GPU in my test environment, but I need guidance on the proper setup for production use in GitHub Actions runners.
Working Test Environment
I managed to get Vulkan + GPU working in a container on an EC2 GPU instance using this approach:
Host setup (EC2):
-
Installed NVIDIA driver (580.95.05)
-
Installed NVIDIA Container Toolkit
-
Followed official NVIDIA docs:
Container setup:
FROM ubuntu:22.04
#Install nvidia-utils-580-server
#Install nvidia-container-toolkit 1.18.0-1
#Install Vulkan SDK 1.4.328.1
#Install Vulkan dependencies (libxcb, libwayland, etc.)
Result: vulkaninfo successfully detects the NVIDIA L4 GPU with Vulkan 1.4.312 support ✅
Current Blocker
I need to reproduce this setup on AWS self-hosted GitHub Actions runners with these specs:
-
Pattern:
github-actions-runner-xlarge-gpu-* -
Labels: self-hosted, x64, stable, infra-eks-general, us-west-2, dind, xlarge, gpu
-
GPU: NVIDIA L4 (driver 535.230.02)
-
Platform: EKS with Bottlerocket OS
-
Use case: CI workflow tests that stress the GPU via Vulkan backend
Current issue: Vulkan cannot see the GPU device in the runner, even though nvidia-smi works fine.
Questions
-
Driver version mismatch? My test env uses driver 580.95.05, but the runner has 535.230.02. Could this cause Vulkan detection issues?
-
Missing libraries in container? Are there specific NVIDIA Vulkan libraries that need to be mounted from the host that I might be missing?
-
Bottlerocket OS specifics? Does Bottlerocket require special configuration for NVIDIA Container Toolkit or Vulkan ICD mounting?
-
Best practices for EKS runners? What’s the recommended way to enable Vulkan GPU access in containerized GitHub Actions runners on EKS?
What I’ve Tried
-
Installing
nvidia-utils-580-serverin the container -
Installing NVIDIA Container Toolkit inside the container
-
Installing Vulkan SDK 1.4.328.1
-
Setting proper environment variables (
VULKAN_SDK,VK_ADD_LAYER_PATH, etc.)
Request
Could someone provide guidance on:
-
Proper host/container setup for Vulkan on EKS Bottlerocket runners
-
Whether the driver version mismatch could be the root cause
-
Any missing libraries or configuration needed for Vulkan ICD detection
I’m happy to provide additional logs, vulkaninfo output, or Docker/runner configuration details if needed.
Thanks in advance for your help!