How to inference with tensorrt on multi gpus in python

Description

Hi, i have 2 different tensortrt models, i want to run trt model A on gpu 1, and run trt model B on gpu 2 with python.
A clear and concise description of the bug or issue.

Environment

Ubuntu 18.04
TensorRT Version: 7.2.3.4
GPU Type: V100
Nvidia Driver Version: 418
CUDA Version: 10.2
CUDNN Version: 8.1
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

Hi @lizcomeon,

Following link may answer your query.

Thank you.