Need help with Kubernetes and GPU scheduling

user143747 · February 1, 2022, 8:14am

Hello,
is this the correct place to ask a technical question?

We have previously used the “Nvidia-device-plugin” which adds GPUs as a ressource to Kubernetes. Our Kubernetes-Jobs are scheduled based on percentage of GPU required (just like it is available for CPUs and memory).

With the Kubernetes Upgrade beyond 1.20 and docker being removed, I found that the preferred installation now uses the “GPU Operator” according to (1). Which seems to work very well. However, I have not been able to get GPUs show up as schedule-able ressources yet (2). As such, our Jobs currently cannot execute.

Is there a part I am missing? Can someone please point me to the instructions I need?

Thank you very much.

(1) Getting Started — NVIDIA Cloud Native Technologies documentation
(2) Excerpt from kubectl get nodes:
Allocated resources:
Resource Requests Limits

cpu 100m (0%) 100m (0%)
memory 50Mi (0%) 50Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
nvidia.com/gpu 0 0 ← missing from GPU-operator setup

Topic		Replies	Views
On-Demand Session: Accelerating Kubernetes with NVIDIA Operators Technical Blog	0	419	July 22, 2021
Static deterministic scheduling of several pods on the same GPU Docker and NVIDIA Docker kubernetes	0	682	March 16, 2023
Adding More Support in NVIDIA GPU Operator Technical Blog	0	340	January 26, 2021
Applications not using GPU inside docker container Docker and NVIDIA Docker	1	1209	May 2, 2024
nvidia-docker inside Kubernetes - Failed to initialize NVML: Unknown Error CUDA Setup and Installation	3	4148	January 9, 2022
Kubernetes Operator (k8s) setting CUDA_VISIBLE_DEVICES CUDA Setup and Installation	1	427	May 17, 2024
NVIDIA GPU Operator: Simplifying GPU Management in Kubernetes Technical Blog	0	468	August 25, 2020
Orchestrating Accelerated Virtual Machines with Kubernetes Using NVIDIA GPU Operator Technical Blog	0	361	October 31, 2022
GPU becomes unavailable after some time in Docker container CUDA Setup and Installation	4	3733	October 12, 2021
NVIDIA driver is not available on latest docker Docker and NVIDIA Docker cuda , docker	8	5604	July 5, 2023

Need help with Kubernetes and GPU scheduling

Related topics