Vulkan GPU Access in AWS GitHub Actions Runner with NVIDIA L4 - Setup Guidance Needed

Hello NVIDIA Community,

I successfully got Vulkan working with NVIDIA L4 GPU in my test environment, but I need guidance on the proper setup for production use in GitHub Actions runners.

Working Test Environment

I managed to get Vulkan + GPU working in a container on an EC2 GPU instance using this approach:

Host setup (EC2):

Container setup:

FROM ubuntu:22.04

#Install nvidia-utils-580-server

#Install nvidia-container-toolkit 1.18.0-1

#Install Vulkan SDK 1.4.328.1

#Install Vulkan dependencies (libxcb, libwayland, etc.)

Result: vulkaninfo successfully detects the NVIDIA L4 GPU with Vulkan 1.4.312 support ✅

Current Blocker

I need to reproduce this setup on AWS self-hosted GitHub Actions runners with these specs:

  • Pattern: github-actions-runner-xlarge-gpu-*

  • Labels: self-hosted, x64, stable, infra-eks-general, us-west-2, dind, xlarge, gpu

  • GPU: NVIDIA L4 (driver 535.230.02)

  • Platform: EKS with Bottlerocket OS

  • Use case: CI workflow tests that stress the GPU via Vulkan backend

Current issue: Vulkan cannot see the GPU device in the runner, even though nvidia-smi works fine.

Questions

  1. Driver version mismatch? My test env uses driver 580.95.05, but the runner has 535.230.02. Could this cause Vulkan detection issues?

  2. Missing libraries in container? Are there specific NVIDIA Vulkan libraries that need to be mounted from the host that I might be missing?

  3. Bottlerocket OS specifics? Does Bottlerocket require special configuration for NVIDIA Container Toolkit or Vulkan ICD mounting?

  4. Best practices for EKS runners? What’s the recommended way to enable Vulkan GPU access in containerized GitHub Actions runners on EKS?

What I’ve Tried

  • Installing nvidia-utils-580-server in the container

  • Installing NVIDIA Container Toolkit inside the container

  • Installing Vulkan SDK 1.4.328.1

  • Setting proper environment variables (VULKAN_SDK, VK_ADD_LAYER_PATH, etc.)

Request

Could someone provide guidance on:

  • Proper host/container setup for Vulkan on EKS Bottlerocket runners

  • Whether the driver version mismatch could be the root cause

  • Any missing libraries or configuration needed for Vulkan ICD detection

I’m happy to provide additional logs, vulkaninfo output, or Docker/runner configuration details if needed.

Thanks in advance for your help!

vulkan aws #eks #githubaction #container

1 Like