I my company we face a problem with sharing GPU in clusters. We have 10-15 servers with GPU nvidia tesla v100 and on each server data scientist work separately.
They pick server based on monitoring and luck (close to random pick) and run teach ML model.
We started RnD to change situation. And We can check GPU workload, but to schedule a queue for requests, you must either anticipate the resources being used in some way or guarantee some resources on the machine.
And i search second solved based on second method. can Architecture NVIDIA or any tech solution grant sharding and provisioning resources on GPU (as an example in % of GPU power) in docker or in VM or on single process level?