Questions for CUDA Time-slicing in kubernetes

giwoong.lee · September 22, 2022, 6:42am

I was trying to find gpu sharing strategy in k8s and finally I found awesome gpu sharing access in k8s, CUDA Time-Slicing[Time-Slicing GPUs in Kubernetes — NVIDIA Cloud Native Technologies documentation].

I read this document and have a question.

Suppose that I have a server which has 8 gpus.
If I set replicas of configmap as 8, then the total available nvidia.com/gpu become 80, right?

If I make a pod whose resource limit is nvidia.com/gpu: 4, then my pod use 40% of execution time of 1 gpu or 10% of execution time of 4gpu?

If I make a pod whose resource limit is nvidia.com/gpu: 40, then my pod use full 2 gpus?

The reason why I ask this question is that it is possible for k8s pods to use multi gpu with CUDA Time-slicing.