Referring to “GitHub - NVIDIA/k8s-device-plugin: NVIDIA device plugin for Kubernetes”,
we have implemented the " NVIDIA device plugin for Kubernetes" and are trying out time slicing,
but encountering issues. Specifically, the GPU capacity is displayed as follows,
with only “1” GPU capacity shown instead of “4” (expected to be 4 due to replicas: 4 in the YAML).
What could be the reason why “Capacity” is not increasing?
Thanks
# kubectl describe node test-server
Capacity:
nvidia.com/gpu: 1
Allocatable:
nvidia.com/gpu: 1
times.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: device-plugin-config
data:
time-sliced: |-
version: v1
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 4
Hardware Information:
Server: PowerEdge R750 (SKU=090E, ModelName=PowerEdge R750)
CPU: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
GPGPU Information:
GPGPU: A100 80GB
CUDA Version: 12.2
Driver Version: 535.54.03
nvidia-container-runtime: runc version 1.0.2、spec: 1.0.2-dev、go: go1.16.7、libseccomp: 2.5.1
Linux Information:
OS: CentOS Linux release 8.5.2111
k8s environment:
kubectl version:
Client Version: version.Info{Major: “1”, Minor: “23”, GitVersion: “v1.23.6”, GitCommit: “ad3338546da947756e8a88aa6822e9c11e7eac22”, GitTreeState: “clean”, BuildDate: “2022-04-14T08:49:13Z”, GoVersion: “go1.17.9”, Compiler: “gc”, Platform: “linux/amd64”}
Server Version: version.Info{Major: “1”, Minor: “23”, GitVersion: “v1.23.17”, GitCommit: “953be8927218ec8067e1af2641e540238ffd7576”, GitTreeState: “clean”, BuildDate: “2023-02-22T13:27:46Z”, GoVersion: “go1.19.6”, Compiler: “gc”, Platform: “linux/amd64”}
crio version: 1.23.5
NVIDIA device plugin for Kubernetes version used: v0.16.1