TensorRT creates many internal cuda streams

LiranBachar · May 2, 2019, 5:34am

Hi.
I noticed deserializeCudaEngine and createExecutionContext creates 12 cuda streams each.
My application loads 12 DNN models and I end up with 288 streams created just by TensorRT + a few more streams that my application creates. this gives a hard time for Nsight which sometimes fails to capture when so many streams are used.

Does TensorRT really need this many streams? How can this be avoided?
Thanks,
Liran.

Topic		Replies	Views
Is 40000 cuda streams an issue? TensorRT	3	426	September 13, 2021
Tensorrtx yolov5 cpp code has more than 10 streams, why? TensorRT tensorrt	1	541	December 20, 2022
Trtexec streams TensorRT tensorrt	1	949	March 24, 2022
Can multiple cudaStream instances share the same tensorrt execution context? TensorRT	2	264	April 3, 2024
Can multiple engines be created from the same tensorrt engine Jetson AGX Xavier tensorrt	2	415	January 9, 2024
Multi Stream in TensorRT TensorRT	1	2173	July 28, 2020
TensorRT Concurrent inference in C++ TensorRT cudnn	4	711	February 6, 2024
Multiple CUDA streams for one tensorrt Model TensorRT	2	22	March 19, 2026
[Question] trtexec understanding issue TensorRT	4	1096	December 6, 2021
Tensorrt & multiple streams GPU-Accelerated Libraries	0	1028	February 6, 2018

TensorRT creates many internal cuda streams

Related topics