Hardware - GPU g5 instance
Hardware - CPU g5 instance
Operating System
Riva Version 2.17
According to How to Deploy Riva at Scale on AWS with EKS
Manual deployment scaling could be done using this command:
kubectl scale deployments/riva-api --replicas=4
BUT It scales only riva-api, since Helm chart was updated to run Riva and Triton servers in separate pods, allowing scaling and deployment across multiple GPUs in Riva Release 2.15.0.
Please update “How to Deploy Riva at Scale on AWS with EKS” tutorial to use latest version of Riva.
Also please add tutorial with autoscaling for latest version of Riva.