Configuration:
- Hardware - GPU A40
- Hardware - CPU Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
- Operating System - Ubuntu 22.04.4
- Nvidia driver version: 550.54.15
- Cuda version: 12.4
- Riva Version - 2.15.0
- Riva Python Client Version - 2.15.0
- Docker version - 25.0.4
Steps to reproduce:
- Deploy Riva Quickstart
- Start sending synthesize_online requests to the Riva server (sample rate - 44100)
Results:
Some of the TTS requests are failed with a Streaming timed out error. It seems to happen mostly with short texts (with a few words). Previously, we used Riva 2.11.0 on this VM, and it worked well without any issues.
Here is an error from a client side:
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Error: Triton model failed during inference. Error message: Streaming timed out"
debug_error_string = "UNKNOWN:Error received from peer ipv4:*.*.*.*:50051 {grpc_message:"Error: Triton model failed during inference. Error message: Streaming timed out", grpc_status:2, created_time:"2024-03-29T08:25:10.895145573+00:00"}"
Hope these files could help during the investigation.
nvidia-smi.log (1.7 KB)
riva-speech.log (4.4 MB)
config.sh.txt (18.9 KB)
Any help on this would be much appreciated.
Thanks in advance!