Description When I’m trying to run two tensor rt engines on two different GPU, there is always some latency for the second GPU to start inference. Is it possible to eliminate the latency? Environment TensorRT Version: 7.0 GPU Type: RTX 2080 Ti and Titian RTX Nvidia Driver Version: 440.64.00 CU…

Latency when running TensorRT engine on two GPU

AI & Data Science Deep Learning (Training & Inference) TensorRT

SunilJB August 21, 2020, 9:34am 9

Hi,

Enqueue operation is CPU call not related to GPU stream.
Context 1 execution start at the same time code calls first enqueue and same is the case with context 2.
If user really want to start inference at the same time, starting a new thread is always the choice.
Please refer to below links in case it helps:

Alternatively, you can use Deepstream to run multiple models.

Thanks

Topic		Replies	Views
Unable to do inference of multiple engines in parallel TensorRT tensorrt , nano	3	1707	May 6, 2022
Batch inference parallelization on tensorrt TensorRT tensorrt , cuda	5	956	May 5, 2021
TensorRT Parallel Inference /concurrent inferecing TensorRT tensorrt	10	4005	October 13, 2022
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1368	September 6, 2024
Inference issue queuing up on one GPU TensorRT tensorrt , cuda , cudnn	1	262	May 31, 2024
how to run trt in multithreading？ Jetson TX2	15	7947	October 18, 2021
Multithread does not improve inference performance with tensorrt models TensorRT tensorrt	2	1175	May 11, 2021
Concurrent tensorRT engines TensorRT jetson	1	393	December 5, 2022
Multiple threads execution with different engines in tensorrt TensorRT tensorrt	3	2469	December 13, 2022
Is multi threaded execution possible with tensorRT? TensorRT	3	2239	April 13, 2020

Latency when running TensorRT engine on two GPU

Related topics