Multiple threads running inference are causing a slowdown

aron.h · August 1, 2023, 10:57am

Description

We get cv::Mat frames using this OpenCV gstreamer pipeline:
filesrc location="./video.mp4" ! qtdemux ! h264parse ! queue ! nvv4l2decoder ! queue ! nvvideoconvert ! video/x-raw,format=BGRx ! videorate max-rate=30 ! videoscale ! video/x-raw,format=BGRx,width=1920,height=1080 ! queue ! videoconvert ! video/x-raw,format=BGR ! appsink

We specify a model using Yolov7::Yolov7 after which we pass the cv::Mat frames to Yolov7::preProcess and then run Yolov7::infer and Yolov7::PostProcess.

The inference works and everything runs fine at this point (around 31 seconds to process a 31 second video).
When we then spin up another thread that does the same thing in parallel, the combined process takes around 6 seconds longer than with a single thread.
For every additional thread after that, there is an additional 20-25 second increase in processing time.

After further examination, the culprit lies somewhere within the method enqueueV2 mentioned in Yolov7.cpp
I traced it’s origin via NvInfer.h and NvInferRuntime.h to NvInferImpl.h.
There the class class VExecutionContext : public VRoot has the method
virtual bool enqueueV2(void* const* bindings, cudaStream_t stream, cudaEvent_t* inputConsumed) noexcept = 0;
From there I can’t find further information nor definition of how it works and why it would be slowing down the overall process.

Any idea of why this is happening?

Environment

TensorRT Version:
TensorRT 8.4.1
GPU Type:
Jetson Orin AGX
Nvidia Driver Version:
CUDA Version:
Cuda SDK 11.4.14
CUDNN Version:
cuDNN 8.4.1
Operating System + Version:
Ubuntu 20.04.6 LTS - JetPack 5.0.2-b231

Relevant Files

We are using the library called Yolov7 made by an Nvidia employee.

AakankshaS · August 1, 2023, 11:37am

Hi,

The below links might be useful for you.

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!

Topic		Replies	Views
Multiple threads running inference are causing a slowdown DeepStream SDK tensorrt , jetson	25	1405	August 13, 2023
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1806	September 6, 2024
Multithread does not improve inference performance with tensorrt models TensorRT tensorrt	2	1266	May 11, 2021
TensorRT multi stream TensorRT	3	2939	February 29, 2024
Multithread inference Jetson Xavier NX tensorrt	4	941	August 29, 2021
Multi-model parallel inferencing TensorRT	1	444	March 31, 2023
Yolov7 inferencing using multiprocess and tensorrt TensorRT	1	1207	April 28, 2023
Speeding up multi-threaded C++ program of TensorRT models TensorRT tensorrt	7	1583	February 20, 2025
Tensorrt multi gpu with multi threads TensorRT	1	1195	February 18, 2022
Multiple context and/or multithreading TensorRT	1	1341	March 24, 2022

Multiple threads running inference are causing a slowdown

Description

Environment

Relevant Files

Related topics