Whether trt model supports distributed deployment like model parallelism


I have a big NLP model like gpt. I converted the model to trt, but the model is too big for one v100 gpu or a100 gpu to be deployed on. so I need deploy the model on 8 gpus.
then my question is :does the trt model support deployment on 8gpus in model parallelism.


TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered


We recommend you to please try the Triton inference server.

Thank you.

Triton doesn’t support trt model in model parallelism like dividing a trt model into 8 parts to 8 gpus. Can I get the conclusion that trt not supporting model parallelism right now?