Slower inference times when running multiple programs

I have a Python program that performs deep network inference on images using TensorFlow. When a single program runs the GPU is not fully utilized. However, when running several programs simultaneously, the inference time per image is slower compared to running a single program. What could be the reason for that and what can be done to improve the inference time for multiple programs? Currently I am on windows (can move to Linux if needed) with GTX 1070 and have set the gpu_options.per_process_gpu_memory_fraction parameter of TesorFlow.