Model deployment in Triton

andover.2020 · October 28, 2022, 9:11pm

Description

I am new to Triton and model deployment using it.
I have a speech-to-text ML model that is saved as a Pytorch structure. I also have it as a docker deployment that uses a FastAPI.
This model takes audio files and exports the result as a text or a JSON file.
I need to scale this model over an A100 GPU machine. The model itself is probably less than 0.5 GB.
Questions:

Is Triton able to distribute the deployment of this model over multiple cores on a single GPU?
If yes, what is a better solution?
2.1. The python backend
2.2. The docker with FastAPI?
Is there a better way to do this and scale the model?

Thanks!

spolisetty · November 2, 2022, 9:10am

Hi,

We recommend you please reach out to the Triton git issues to get better help.

Thank you.