Riva Speech Server Fails to Start Due to Model Loading Errors

jepetolee · January 21, 2025, 6:20am

I encountered several issues while running the assessment.ipynb file in the NVIDIA DLI course “Building Conversational AI Applications”, specifically during the setup and execution of Riva Speech Skills. Below are the details of the problems:

Model Loading Failure:

The Triton Inference Server fails to load models and terminates unexpectedly during the initialization phase.
Error messages from docker logs include:

error: creating server: Internal - failed to load all models
> Triton server died before reaching ready state. Terminating Riva startup.

Additional errors indicate missing configuration files (config.pbtxt) for certain models:

Poll failed for model directory '1': failed to open text file for read /data/models/1/config.pbtxt: No such file or directory

Excessive Number of Models:

The /data/models directory contains an excessive number of models, including ASR, TTS, and NLP-related models.
This appears to prolong the model loading process, leading to timeout issues.

Timeout Issues:

Riva Speech Skills waits for the models to load but fails due to a timeout:

Timeout 29: Found 4 live models and 0 in-flight non-inference requests

Despite increasing the timeout value, the server still fails to initialize all models successfully.

Docker Environment Configuration:

Potential misconfigurations in Docker container resource allocation (e.g., memory, GPU usage) could also be contributing to the problem.

Steps Taken:

Verified the contents of the /data/models directory using docker exec and confirmed that some models are missing critical files like config.pbtxt.
Attempted to reduce the number of models by only keeping those relevant to ASR, but the server still fails to start.
Edited the riva_start.sh script to extend the timeout period but encountered the same issue.

Request for Assistance:

What is the recommended way to handle the excessive number of models? Is there a list of essential models required for basic ASR functionality?
How can I ensure all necessary model files (e.g., config.pbtxt) are present and properly configured?
Are there additional changes needed in the riva_start.sh script or Docker configuration to resolve this issue?
Could there be compatibility issues between the Triton Inference Server and Riva Speech Skills, given the current setup?

Any guidance or suggestions to resolve these issues would be greatly appreciated. Thank you!

sophwats · January 21, 2025, 4:03pm

Hi @jepetolee thanks for sharing the detailed issues here. I’m reaching out to the course owner to get back to you! Thanks for your patience.

mayjain · January 24, 2025, 5:13am

Can you please run riva_clean.sh and try again?
For running the basic ASR functionality, NMT and TTS models are not needed, you can disable them from config.sh

danaNVIDIA · January 27, 2025, 5:27pm

@jepetolee I’m sorry you are having difficulties running the assessment. I suspect your issue is caused by some model loading mis-matches going on in background and unique to this course. Here are some tips to hopefully get you across the finish line!

If you are spinning up the course and jumping to the assessment on a cold start, in addition to setting up your NGC key again in notebook 3, it is safest to wait for all the background data loads and Docker image loads to finish. Getting to this state takes 18-20 minutes, but jumping in early may have unpredictable results. You can check the status by looking in dli_workspace. When everything is loaded, there should be no .tar or .tgz files remaining. During the original in-person delivery of the course, this background data load was complete by the time it was needed due to lectures and so on, so you may not have been aware it was occurring.
Correctly setting up config.sh in step 1 is critical. Pay attention to the hint: “# Check your work - are all three services enabled? Is the model location repo correct?”
Proceed through the assessment steps, not skipping anything. Pay attention to the instructions, FIXME sections, and “Check your work” hints.

If you follow the tips above, you should not get the errors you reported. I just went through it myself and had no errors.

Topic		Replies	Views
Docker - Riva fails to launch Model not specified Riva	3	636	January 11, 2023
Run init_start.sh failed Riva riva	1	1269	April 12, 2022
Riva_start.sh will not load the models Riva riva	3	1203	April 23, 2024
Riva_server fails if not all Triton models are loaded Riva	6	1072	January 27, 2023
Riva_start.sh will not start the server Riva riva	4	1126	August 31, 2023
Riva_start.sh not starting RIVA Speech services Riva riva	4	146	March 20, 2025
Riva waiting for Triton server to load all models...retrying in 1 second Riva riva	2	1014	March 22, 2023
Nvidia Riva health check fail Riva riva	1	472	February 14, 2025
Failed to get riva started Riva riva	7	1741	December 3, 2022
Triton server died before reaching ready state. Terminating Riva startup Riva inference-server-triton , riva	2	1109	October 18, 2024

Riva Speech Server Fails to Start Due to Model Loading Errors

Related topics