Whether trt model supports distributed deployment like model parallelism


I have a big NLP model like gpt. I converted the model to trt, but the model is too big for one v100 gpu or a100 gpu to be deployed on. so I need deploy the model on 8 gpus.
then my question is :does the trt model support deployment on 8gpus in model parallelism.


We recommend you to please try the Triton inference server.

Thank you.

Triton doesn’t support trt model in model parallelism like dividing a trt model into 8 parts to 8 gpus. Can I get the conclusion that trt not supporting model parallelism right now?