I was trying to find gpu sharing strategy in k8s and finally I found awesome gpu sharing access in k8s, CUDA Time-Slicing[Time-Slicing GPUs in Kubernetes — NVIDIA Cloud Native Technologies documentation].
I read this document and have a question.
Suppose that I have a server which has 8 gpus.
If I set replicas of configmap as 8, then the total available nvidia.com/gpu
become 80, right?
If I make a pod whose resource limit is nvidia.com/gpu: 4
, then my pod use 40% of execution time of 1 gpu or 10% of execution time of 4gpu?
If I make a pod whose resource limit is nvidia.com/gpu: 40
, then my pod use full 2 gpus?
The reason why I ask this question is that it is possible for k8s pods to use multi gpu with CUDA Time-slicing.