How to prevent CUDA apps from hogging all GPU memory

I need to run multiple CUDA apps on the same system, sharing the same GPU. It’s a large GPU, so there should be plenty of memory for everyone.

But the first Tensorflow app I launch takes over almost all GPU RAM. The second app barely gets the memory it needs. The third app fails to launch for lack of GPU RAM.

With Tensorflow I could do something like this:

physical_devices = tf.config.list_physical_devices('GPU') 
for gpu_instance in physical_devices: 
    tf.config.experimental.set_memory_growth(gpu_instance, True)

And that works fine. Each app only takes the memory it needs, and I can run lots of them.

But that approach relies on everyone “playing nice”. Also, it’s framework-dependent; it’s different in PyTorch, etc. And if I can’t fix the code then I have no guarantee that some app won’t steal all the GPU to itself. In some cases the code is in containers that I can’t or shouldn’t mess with.

Is there a low-level approach to fixing this issue? Can I go at the driver level and say “each app should only get the memory it needs”?

FYI, the host is CentOS, the latest Nvidia drivers are installed.