Slower inference times when running multiple programs

marinak · May 6, 2019, 5:35am

I have a Python program that performs deep network inference on images using TensorFlow. When a single program runs the GPU is not fully utilized. However, when running several programs simultaneously, the inference time per image is slower compared to running a single program. What could be the reason for that and what can be done to improve the inference time for multiple programs? Currently I am on windows (can move to Linux if needed) with GTX 1070 and have set the gpu_options.per_process_gpu_memory_fraction parameter of TesorFlow.

Topic		Replies	Views
Cuda Kernels running slow CUDA Programming and Performance	0	480	November 9, 2018
GPU slowdown with multiple streaming TensorRT	4	831	March 4, 2020
the inference time increases linearly when running more than 2 tensorrt instance on single GPU TensorRT	1	1578	April 4, 2019
Multithread does not improve inference performance with tensorrt models TensorRT tensorrt	2	1185	May 11, 2021
When running multiple inferences and benchmarking the time of each one, does the first inference longer than the other ones? Jetson TX2 jetson-inference	4	395	October 18, 2021
Estimating inference and training time of a neural network on GPU Maxine	2	2705	February 5, 2022
Help Needed with Optimizing CUDA Kernel Performance for Deep Learning Inference cuDNN cudnn	2	20	June 21, 2025
Inference Time When Using Multi Stream Multi Context in TensorRT is Slower than a Single One TensorRT tensorrt , cuda , cudnn	1	43	November 30, 2024
Tensorrt multiple process TensorRT tensorrt	2	1579	February 21, 2024
Limit GPU usage per process Jetson TX2	2	1379	October 18, 2021

Slower inference times when running multiple programs

Related topics