Run multiple model(engine) with tensorrt without deepstream


how can configure multiple model (back to back pipeline) with tensorrt . (multi-thread). in this configuration how set context, allocate memory for multiple engine.
in my configuration i take below error in memory :

boxes_2, confs_2, clss_2 = trt_yolov3.detect(img, conf_th)
File “/home/cv/Downloads/tensorrt_demos-master/tensorrt_demos-master/utils2/”, line 479, in detect
File “/home/cv/Downloads/tensorrt_demos-master/tensorrt_demos-master/utils2/”, line 367, in do_inference
[cuda.memcpy_htod_async(inp.device,, stream) for inp in inputs]
File “/home/cv/Downloads/tensorrt_demos-master/tensorrt_demos-master/utils2/”, line 367, in
[cuda.memcpy_htod_async(inp.device,, stream) for inp in inputs]
pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument

In order to run multiple model with TensorRT, i will recommend you to either use NVIDIA deepstream or NVIDIA Triton Inference Server.
Please refer below link for more details:

If you want to perform multi threading using TensorRT, please refer below link for best practices: