I have a Python program that performs deep network inference on images using TensorFlow. When a single program runs the GPU is not fully utilized. However, when running several programs simultaneously, the inference time per image is slower compared to running a single program. What could be the reason for that and what can be done to improve the inference time for multiple programs? Currently I am on windows (can move to Linux if needed) with GTX 1070 and have set the gpu_options.per_process_gpu_memory_fraction parameter of TesorFlow.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Cuda Kernels running slow | 0 | 477 | November 9, 2018 | |
GPU slowdown with multiple streaming | 4 | 808 | March 4, 2020 | |
the inference time increases linearly when running more than 2 tensorrt instance on single GPU | 1 | 1571 | April 4, 2019 | |
Multithread does not improve inference performance with tensorrt models | 2 | 1172 | May 11, 2021 | |
When running multiple inferences and benchmarking the time of each one, does the first inference longer than the other ones? | 4 | 387 | October 18, 2021 | |
Estimating inference and training time of a neural network on GPU | 2 | 2599 | February 5, 2022 | |
Inference Time When Using Multi Stream Multi Context in TensorRT is Slower than a Single One | 1 | 32 | November 30, 2024 | |
Tensorrt multiple process | 2 | 1500 | February 21, 2024 | |
Limit GPU usage per process | 2 | 1369 | October 18, 2021 | |
Why is TensorRT faster than TensorFlow? | 3 | 1617 | April 26, 2022 |