RIVA server turns off immediately

I am trying to run RIVA on Jetson Orin AGX platform, I followed the steps in this link: Quick Start Guide — NVIDIA Riva.
I have successfully run all the steps until I run the riva_start.sh.

When I run the riva_start.sh script, it starts the docker but exits immediately with these errors:
Running riva_start.sh starts the server and runs for 3 seconds and exits immediately.
Below is the log from bash riva_start.sh

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
timeout: failed to connect service ":50051" within 1s
running command: docker exec riva-speech /bin/grpc_health_probe -addr=:50051  2 #> /dev/null
Riva server is ready...
Use this container terminal to run applications:
root@1ac3f1aba9f3:/opt/riva# 

Below is the log from: docker logs riva-speech

/opt/riva/bin/start-riva: line 10: curl: command not found
/opt/riva/bin/start-riva: line 11: [: -ne: unary operator expected
  > Triton server is ready...
W1113 14:15:19.424793 20 metrics.cc:354] No polling metrics (CPU, GPU, Cache) are enabled. Will not poll for them.
I1113 14:15:19.425102 20 tritonserver.cc:2264] 
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                               |
| server_version                   | 2.27.0                                                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging |
| model_control_mode               | MODE_NONE                                                                                                                                                                                            |
| strict_model_config              | 1                                                                                                                                                                                                    |
| rate_limit                       | OFF                                                                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                            |
| cuda_memory_pool_byte_size{0}    | 1000000000                                                                                                                                                                                           |
| response_cache_byte_size         | 0                                                                                                                                                                                                    |
| min_supported_compute_capability | 5.3                                                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                                                   |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1113 14:15:19.425118 20 server.cc:261] No server context available. Exiting immediately.
error: creating server: Invalid argument - --model-repository must be specified
I1113 14:15:19.433980    22 riva_server.cc:126] Using Insecure Server Credentials
E1113 14:15:19.436285    22 model_registry.cc:288] error: unable to get server status: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8001: Failed to connect to remote host: Connection refused
One of the processes has exited unexpectedly. Stopping container.
W1113 14:15:24.437695    22 riva_server.cc:196] Signal: 15

Specs:
Hardware:Jetson Orin AGX
Operating System: Ubuntu
Riva Version: arm64_v2.13.1
Package: nvidia-jetpack
Version: 5.1-b147
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1-b147), nvidia-jetpack-dev (= 5.1-b147)
GCC version: gcc version 9.3.0 (Buildroot 2020.08)
nvidia driver: version: 35.2.1

How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)
By following all the steps in this link: Quick Start Guide — NVIDIA Riva

1 Like

It looks like I have set the flag use_existing_rmirs to true, which in turn skipped downloading the needed models in the model_repository directory.
After setting it to false, things works well.
The error message was giving a hint but was a little misleading.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.