Run multiple model(engine) with tensorrt without deepstream


how can configure multiple model (back to back pipeline) with tensorrt . (multi-thread). in this configuration how set context, allocate memory for multiple engine.
in my configuration i take below error in memory :

boxes_2, confs_2, clss_2 = trt_yolov3.detect(img, conf_th)
File “/home/cv/Downloads/tensorrt_demos-master/tensorrt_demos-master/utils2/”, line 479, in detect
File “/home/cv/Downloads/tensorrt_demos-master/tensorrt_demos-master/utils2/”, line 367, in do_inference
[cuda.memcpy_htod_async(inp.device,, stream) for inp in inputs]
File “/home/cv/Downloads/tensorrt_demos-master/tensorrt_demos-master/utils2/”, line 367, in
[cuda.memcpy_htod_async(inp.device,, stream) for inp in inputs]
pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument

A clear and concise description of the bug or issue.


TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered


In order to run multiple model with TensorRT, i will recommend you to either use NVIDIA deepstream or NVIDIA Triton Inference Server.
Please refer below link for more details:

If you want to perform multi threading using TensorRT, please refer below link for best practices: