Deploying a Natural Language Processing Service on a Kubernetes Cluster with Helm Charts from NVIDIA NGC

Originally published at:

Conversational AI solutions such as chatbots are now deployed in the data center, on the cloud, and at the edge to deliver lower latency and high quality of service while meeting an ever-increasing demand. The strategic decision to run AI inference on any or all these compute platforms varies not only by the use case…

Not directly relevant but hopefully, we can see an on-prem version of image based inference example, with multinode autoscaling. There are not much how-to data about on-prem k8s + Triton server, with horizontal PA.