I have a multi-tenant environment composed by a cluster of servers equiped with NVIDIA Teslas (K80 and P100). They are all POWER8 systems running Ubuntu 16.04, NVIDIA driver version 384.81, CUDA v8.0, and cuDNN v6.0. At the moment, TensorFlow is the predominant workload (v1.2.1, built with CC 3.7).
The issue I’m running into is that, by default, TensorFlow sessions will attach to all available GPUs and allocate all available memory. As far as I understand (it isn’t too far), it is completely up to the user to set limits to memory consumption on the application code (via ‘gpu_options.allow_growth’ or ‘gpu_options.per_process_gpu_memory_fraction’).
Workload type aside, is there a way to limit the maximum amount memory allocation for each GPU process?
Thanks in advance!