TRTIS is a container that manipulates all inference models, can it manipulate containers?
In a containerized environment, user may put his developed inference program in a container and then can deploy it. If using TRTIS instead, user is required additional efforts to integrate his inference program to TRTIS, which is much less convenient. I understand current architecture of TRTIS brings it high throughput and GPU utilization. Is it possible to expand TRTIS to support manipulating containers so that user does not need to make additional integration effort?
I think about how to utilize TRTIS’s strengths (e.g. concurrent model execution) to support containers. For example, providing specialized docker image for users (inference program developer), this specialized docker image contains specialized AI frameworks (e.g. specialized TensorFlow, Caffe,…) that can transform model execution instances to CUDA streams for parallel computation, just like TRTIS server does.