is this the correct place to ask a technical question?
We have previously used the “Nvidia-device-plugin” which adds GPUs as a ressource to Kubernetes. Our Kubernetes-Jobs are scheduled based on percentage of GPU required (just like it is available for CPUs and memory).
With the Kubernetes Upgrade beyond 1.20 and docker being removed, I found that the preferred installation now uses the “GPU Operator” according to (1). Which seems to work very well. However, I have not been able to get GPUs show up as schedule-able ressources yet (2). As such, our Jobs currently cannot execute.
Is there a part I am missing? Can someone please point me to the instructions I need?
Thank you very much.
(1) Getting Started — NVIDIA Cloud Native Technologies documentation
(2) Excerpt from kubectl get nodes:
Resource Requests Limits
cpu 100m (0%) 100m (0%)
memory 50Mi (0%) 50Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
nvidia.com/gpu 0 0 ← missing from GPU-operator setup