Speedup by increasing # of streams vs. batch size

hyunjun.shin · May 23, 2022, 3:11am

Description

I experimented with speedup by increasing the number of streams or batch size.
I expected enough speedup in both cases.
But, there is no significant speedup by increasing the number of streams.
Multi-stream is faster than sequential processing.
Even multi-stream reduces the memory transfer time of input image by pipelining.
Do you think this result is normal??

Environment

TensorRT Version: 8.2.1.8
GPU Type: T4
Nvidia Driver Version: 470.63.01
CUDA Version: 10.2
CUDNN Version: 8.2.4.15
Operating System + Version: Ubuntu 18.04.6 LTS

Image Size: 960x604
Network Model: SSD

NVES · May 23, 2022, 3:38am

Hi,

The below links might be useful for you.
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!

hyunjun.shin · June 23, 2022, 5:54am

I try it again with seperate context (nvinfer1::IExecutionContext) for each stream.
But the execution time shows similar pattern. (no significant speedup by increasing the number of streams)
Do I need to create anything more seperately?
@NVES

Topic		Replies	Views
Inference Time When Using Multi Stream in TensorRT is Much Slower than a Single One TensorRT tensorrt	5	2519	March 30, 2023
GPU slowdown with multiple streaming TensorRT	4	846	March 4, 2020
Batching vs CUDA Streams for concurrent inferences? TensorRT tensorrt , cuda	7	2018	October 12, 2021
Multi Stream in TensorRT TensorRT	1	2123	July 28, 2020
Inference Time When Using Multi Stream Multi Context in TensorRT is Slower than a Single One TensorRT tensorrt , cuda , cudnn	1	49	November 30, 2024
TensorRT multi stream TensorRT	3	2782	February 29, 2024
Multiple threads running inference are causing a slowdown TensorRT jetson , deepstream	1	783	August 1, 2023
Tensorrtx yolov5 cpp code has more than 10 streams, why? TensorRT tensorrt	1	497	December 20, 2022
Multi-model parallel inferencing TensorRT	1	395	March 31, 2023
Batch inference parallelization on tensorrt DeepStream SDK tensorrt	2	496	October 12, 2021

Speedup by increasing # of streams vs. batch size

Description

Environment

Related topics