Documentation for multi-model serving with overcommit on Triton

I read in the seldon core documentation that multi-model serving with overcommit is available out of the box on nvidia triton

https://docs.seldon.io/projects/seldon-core/en/v2/contents/models/mms/mms.html?highlight=multi%20modal%20serving

Can you please share documentation on how to configure and implement multi-model serving with overcommit using Nvida Triton

Hi,

The below links might be useful for you.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!