[Question] trtexec understanding issue

redradist · November 26, 2021, 11:23pm

Hi all,

Recently I’ve faced with an issue that when I run trtexec with option “–streams=14” only one CUDA stream executes all infers !!

But when I add option “–threads=14”, then 14 CUDA threads execute in parallel …

But why ? I saw that even in single thread there are used enqueueV2 API that should execute all infers in parallel !!
But to see parallel execution i need to add this “–threads” option …

Are there some limitation of enqueueV2 ? Maybe it executes in single thread in some cases ??

redradist · November 26, 2021, 11:43pm

Seems like it happens, because:

    IterationStreams iStreams;
    for (int s = 0; s < streams; ++s)
    {
        Iteration* iteration = new Iteration(offset + s, inference, *iEnv.context[offset], *iEnv.bindings[offset]);
        ...
    }

the same context was used for multiple streams and according to the documentation for enqueueV2 it is undefined behavior:

Calling enqueueV2() in from the same IExecutionContext object with different CUDA streams concurrently results in undefined behavior. To perform inference concurrently in multiple streams, use one execution context per stream.

Am I right ?

But then why do you have in TensorRT such bad example with undefined behavior ?

redradist · November 27, 2021, 10:01am

Seems like I have found similar issue multi-stream parallel execution with one GPU ERROR · Issue #846 · NVIDIA/TensorRT · GitHub

NVES · November 29, 2021, 9:40am

Hi,
Please refer to the below link for Sample guide.

Refer to the installation steps from the link if in case you are missing on anything

However suggested approach is to use TRT NGC containers to avoid any system dependency related issues.

In order to run python sample, make sure TRT python packages are installed while using NGC container.
/opt/tensorrt/python/python_setup.sh

In case, if you are trying to run custom model, please share your model and script with us, so that we can assist you better.
Thanks!

redradist · December 6, 2021, 10:17am

Hi @NVES,

It is not related to custom model, it is related to undefined behaviour in you trtexec example as I described above …
Running enqueueV2 on different streams is undefined behaviour, and you has such undefined behaviour in your code

Topic		Replies	Views
Batch inference parallelization on tensorrt TensorRT tensorrt , cuda	5	945	May 5, 2021
Latency when running TensorRT engine on two GPU TensorRT	9	1221	August 24, 2020
Inference Time When Using Multi Stream in TensorRT is Much Slower than a Single One TensorRT tensorrt	5	2428	March 30, 2023
TensorRT Concurrent inference in C++ TensorRT cudnn	4	578	February 6, 2024
Unable to do inference of multiple engines in parallel TensorRT tensorrt , nano	3	1673	May 6, 2022
Parallel execution of several trt contexts on one GPU TensorRT onnx	1	1088	August 7, 2023
TensorRT 3.0.2 with multi-streaming TensorRT	3	2796	September 10, 2018
TensorRT Parallel Inference /concurrent inferecing TensorRT tensorrt	10	3875	October 13, 2022
Multiple threads execution with different engines in tensorrt TensorRT tensorrt	3	2378	December 13, 2022
Are there any issues with calling enqueueV3 on multiple Streams with a single ExecutionContext? TensorRT tensorrt	1	400	June 10, 2024

[Question] trtexec understanding issue

Related topics