Batch inference parallelization on tensorrt

juliefraysse · April 21, 2021, 9:57am

I want to execute batch inferences concurrently on the GPU.
I read the 2.3. Streaming paragraph of the following documentation :
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#optimize-performance
I tried to run concurrently 2 batch inference from one thread.
I created a CUDA stream using cudaStreamCreate for each batch and an IExecutionContext for each batch.
The problem is : Only few kernels are executed concurently.
Is there an obvious reason why ?

NB: I do not use dynamic shapes.

Environment:
TensorRT Version: 7.2
CUDA Version: 11.2
CUDNN Version: 11.2

bcao · April 23, 2021, 8:55am

Hey, customer
I think you need to create a topic under tensorrt forum to ask for help.

Topic		Replies	Views
Batch inference parallelization on tensorrt TensorRT tensorrt , cuda	5	1012	May 5, 2021
TensorRT on Multiple CUDA-Streams GPU-Accelerated Libraries	1	2455	May 9, 2018
[TensorRT] Speed of concurrent execute multiple TensorRT model on one GPU TensorRT tensorrt	1	1836	May 24, 2020
TensorRT Parallel Inference /concurrent inferecing TensorRT tensorrt	10	4288	October 13, 2022
Is multi threaded execution possible with tensorRT? TensorRT	3	2302	April 13, 2020
Parallelize tensorRT inference in C++ TensorRT	1	548	April 6, 2020
TensorRT batch inference - How to be sure one kernel does use all the GPU ressources? TensorRT tensorrt , nsight	3	843	May 18, 2021
TRT concurrently Jetson TX2 tensorrt	7	1177	September 5, 2021
Multiple concurrent Execution Contexts? TensorRT tensorrt	6	1853	February 14, 2022
Issue in making streams concurrent Jetson AGX Xavier	6	952	April 11, 2019

Batch inference parallelization on tensorrt

Related topics