We have one A100-SXM4-80GB GPU server integrated in a kubernetes cluster.
After reading the documentation, I am wondering about the way to set a “max amount of GPU” to a k8s namespace, that could be used by different workload sizes.
For instance, let say we want to allocate one full GPU card (out of 8 available ones) to a project/namespace, but we want to let users the possibility to execute workloads using either 1g.10gb or 2g.20gb or even 4g.40gb (depending on their use cases), the total of all their workloads having to fit in the single GPU instance (for instance, they could simultaneously start 7 workloads using 1g.10gb, or 3 workloads using 2g.20gb + 1 workloag using 1g.10gb, and so on…)
I couldn’t figure out how to set ResourceQuota on the project namespace to reach such a goal:
From what I understood, I can set following in the “hard” section of the ResourceQuota to assign a full GPU instance to the namespace:
Or following to restrict the assignment to 4 * 2g20gb:
But will k8s be able to “understand” that 7 * 1g.10g workloads fit in a full GPU card ?
Will it be able to sum “1g.10g” & “2g.20gb” gpu slices and infer that the sum is under the limit we set ?
I would not want to have to set:
requests.nvidia.com/mig-1g.10gb: '7' requests.nvidia.com/mig-2g.20gb: '4' requests.nvidia.com/mig-4g.40gb: '2'
Because I guess that in that case this would be cumulative resources, right ?
Thanks for any help