how to run trt in multithreading？

380483397 · April 19, 2018, 6:45am

Recently, i run a trt in single thread cost 6ms, then run the same trt in 2 threads cost 10ms, it looks like the threads nteract with each other, but i put the trt in 2 processes and run at the same time ,it’s ok , cost 6ms. so i want to konw why and what can i do if i need to run trt some times at the same time ?

AastaLLL · April 19, 2018, 8:59am

Hi,

TensorRT supports multiple threads so long as each is used with a separate execution context.

Thanks.

380483397 · April 20, 2018, 1:02am

you said i know,i mean is that multiple threads cost time is longer than single thread on 1 GPU, how to solve the time problem?

AastaLLL · April 20, 2018, 2:48am

Hi,

Sorry for the unclear explanation.

Have you launched the TensorRT models with separate execution context?
This is essential for running inference in parallel or some latency will occur for sharing the GPU resource.

Thanks.

380483397 · April 20, 2018, 3:10am

in my program, one thread binding one different TensorRT context.
in addition, i use TensorRT API and Plugin (contain some kernel function) to create my network.

i think the reason may be the sharing the GPU resource (it’s just my guess.), but is there any solution?

AastaLLL · April 24, 2018, 6:54am

Hi,

It’s recommended to profile GPU utilization with nvprof first.

Another possible reason is the limited resource of memory bandwidth.
Have you used memcopy in your plugin implementation?

Thanks.

380483397 · April 26, 2018, 1:15am

Have you used memcopy in your plugin implementation?
—YES!
What’s wrong with that and any Suggestions?

AastaLLL · April 26, 2018, 10:06am

Hi,

Due to some hardware issue, we don’t support asynchronous memory copy on Jetson.
This may have some impact in the parallelism of TenosrRT if memory copy is used.

Thanks.

380483397 · April 26, 2018, 2:13pm

i run my program on x86 machine,does the problem still exist?

AastaLLL · April 30, 2018, 7:21am

Hi,

Asynchronous memory copy run well on x86 machine.

Usually, concurrency mechanism requires executing with multiple CUDA stream.
Have you launched your TensorRT context with independent CUDA stream?

Here is a tutorial for your reference:

Thanks.

J8oe · February 19, 2019, 2:19am

@AastaLLL hi, how about using the MPS, can that achieve the concurrency mechanism?

AastaLLL · February 21, 2019, 9:20am

Hi,

You can find some suggestions for TensorRT with multithread here:
[url]https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#thread-safety[/url]

Thanks.

380483397 · December 6, 2019, 9:28am

Two questions:

if I want to be more efficient, should I use batch or multithreading？
trt runtime can make multiple contexts, and one engine can also create multiple contexts, what’s the difference between the context created in these two ways?

Chieh · May 13, 2020, 8:32am

Dear @AastaLLL,

I need your favor!
I have read this document but I still have no idea how to exactly do on python.

Currently I have a sample which can successfully run on TRT.
Now I just want to run TensorRT by multi-threading with a really simple code.
(I have generated the TensorRT engine. so I will load an engine and do TensorRT inference by multi-threading.)

Here is my code below. (Without the Tensorrt code)

import threading
import time
from my_tensorrt_code import TRTInference, trt

exitFlag = 0

class myThread(threading.Thread):
   def __init__(self, func, args):
      threading.Thread.__init__(self)
      self.func = func
      self.args = args
   def run(self):
      print ("Starting " + self.args[0])
      self.func(*self.args)
      print ("Exiting " + self.args[0])

if __name__ == '__main__':
    # Create new threads
    '''
    format thread:
        - func: function names, function that we wished to use
        - arguments: arguments that will be used for the func's arguments
    '''

    trt_engine_path = './tensorrt_engine.trt'

    max_batch_size = 1
    trt_inference_wrapper = TRTInference(trt_engine_path, 
        trt_engine_datatype=trt.DataType.FLOAT,
        batch_size=max_batch_size)

    # Get TensorRT SSD model output
    input_img_path = './testimage.png'

    thread1 = myThread(trt_inference_wrapper.infer, [input_img_path])

    # Start new Threads
    thread1.start()
    thread1.join()
    print ("Exiting Main Thread")

However, when I run this code, I always got this error messages below.

[TensorRT] ERROR: ../rtSafe/cuda/caskConvolutionRunner.cpp (290) - Cask Error in checkCaskExecError<false>: 7 (Cask Convolution execution)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception

I found that this error message would get error during doing the do_inference function.

def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    stream.synchronize()
    return [out.host for out in outputs]

Could you share me some suggestions that how to fix this error?
This error happened not only on desktop but also on Jetson devices…

Thank you so much!

Best regards,
Chieh

sonineha45 · July 13, 2021, 7:19am

is there any example available for multi threading use?

Topic		Replies	Views
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1849	September 6, 2024
Speeding up multi-threaded C++ program of TensorRT models TensorRT tensorrt	7	1608	February 20, 2025
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	19401	October 18, 2021
Difference in preformance of running two NN with TRT in 2 threads vs 2 processes Jetson TX2 tensorrt	5	545	October 4, 2023
Tensorrt multi gpu with multi threads TensorRT	1	1209	February 18, 2022
Parallel execution of several trt contexts on one GPU TensorRT onnx	1	1470	August 7, 2023
TensorRT Parallel Inference /concurrent inferecing TensorRT tensorrt	10	4438	October 13, 2022
Batch inference parallelization on tensorrt TensorRT tensorrt , cuda	5	1057	May 5, 2021
Multi-process running tensorRT Jetson AGX Xavier tensorrt	5	1665	October 18, 2021
Is TensorRT safe to create engine & context in one thread, and execute in another thread? TensorRT	1	751	June 5, 2022

how to run trt in multithreading？

Related topics