Is there a way to make kubernetes schedule several (known in advance) containers to use the same GPU using the gpu operator scheduling plugin?
Our workload consists of several python-based services each using GPU for its work. All services need to be preloaded to ensure zero latency. Our worker node has several GPUs to allow such operation. Each service runs for a short burst of time and requires a small amount of GPU memory. In order to maximise GPU utilisation we would like to let several services use the same GPU. Not all services are equal, some require much more resources and thus we would like to dedicate a separate GPU for them, others safely coexist. The desired scheduling pattern is constant and known in advance.
In the documentation we saw there is an option to over-provision available GPU (it looks like single GPU will appear to kubernetes as X GPUs), but this doesn’t allow us to control which services will share GPU.
Our question is - is there a way to achieve such static allocation of GPUs in kubernetes using the nvidia gpu operator?