Something wrong with riva quickstart

ryein · December 27, 2022, 12:30pm

Tried on two different systems and riva quickstart keeps failing to launch.

rvinobha · December 29, 2022, 1:16pm

Hi @ryein

Thanks for your interest in Riva

Can you please share with us the

whether is it riva or riva-embedded
config.sh used
config.sh used
complete log output of bash riva_init.sh
complete log output of bash riva_start.sh

Thanks

jason.grey · January 2, 2023, 9:38pm

Not sure if it’s same issue, but I also had trouble with:

riva_quickstart_v2.8.1
riva - not embeded
config.sh - unmodified/default

riva_init.sh was failing with:

....
To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.
ERROR: No supported GPU(s) detected to run this container

Failed to detect NVIDIA driver version.

/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
TensorRT is not available! Will use ONNX backend instead.
2023-01-02 21:17:04,748 [INFO] Writing Riva model repository to '/data/models'...
2023-01-02 21:17:04,748 [INFO] The riva model repo target directory is /data/models
2023-01-02 21:17:06,252 [INFO] Using onnx runtime
2023-01-02 21:17:06,253 [INFO] Extract_binaries for language_model -> /data/models/riva-onnx-riva_text_classification_domain-nn-bert-base-uncased/1
2023-01-02 21:17:06,253 [INFO] extracting {'ckpt': ('nemo.collections.nlp.models.text_classification.text_classification_model.TextClassificationModel', 'model_weights.ckpt'), 'bert_config_file': ('nemo.collections.nlp.models.text_classification.text_classification_model.TextClassificationModel', 'bert-base-uncased_encoder_config.json')} -> /data/models/riva-onnx-riva_text_classification_domain-nn-bert-base-uncased/1
2023-01-02 21:17:07,806 [INFO] Printing copied artifacts:
2023-01-02 21:17:07,806 [INFO] {'ckpt': '/data/models/riva-onnx-riva_text_classification_domain-nn-bert-base-uncased/1/model_weights.ckpt', 'bert_config_file': '/data/models/riva-onnx-riva_text_classification_domain-nn-bert-base-uncased/1/bert-base-uncased_encoder_config.json'}
2023-01-02 21:17:07,806 [ERROR] Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/servicemaker/cli/deploy.py", line 100, in deploy_from_rmir
    generator.serialize_to_disk(
  File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 445, in serialize_to_disk
    module.serialize_to_disk(repo_dir, rmir, config_only, verbose, overwrite)
  File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 311, in serialize_to_disk
    self.update_binary(version_dir, rmir, verbose)
  File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 757, in update_binary
    self.update_binary_from_copied(version_dir, rmir, copied, verbose)
  File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 734, in update_binary_from_copied
    raise Exception("Need TRT and bert_config_file for ckpt model")
Exception: Need TRT and bert_config_file for ckpt model

+ '[' 1 -ne 0 ']'
+ echo 'Error in deploying RMIR models.'
Error in deploying RMIR models.
+ exit 1

What I did to get it to proceed was to modify the docker run calls in riva_init.sh to use “–privileged” anywhere that call also used --gpus (there were 2 of such calls)

Context:

I’m running fresh install of PopOS 22.04/Ubuntu 22.04
Docker version 20.10.22, build 3a2c30b
nvidia docker v2.11.0
I’m using a non-root user to access my docker, but he is in the docker group

It now seems to be doing a whole lot of something… and I can see it’s using my GPU resources, so, that’s good. (not sure if I will run into issues with riva_start.sh yet, I have not gotten that far)

Hopefully that helps someone…

jason.grey · January 2, 2023, 10:12pm

follow up on my previous post - I had to do the opposite on riva_start.sh - by adding “–gpus all” to the docker command which only had “–privileged” in it…

but then it did all seem to work, and I was able to run examples.

Note: I’m on a 4090, which doesn’t have enough ram to run all models at same time either, so I also disabled nlp and tts services at the top of config.sh and did a riva_clean.sh and then re-initialized.

ryein · January 20, 2023, 8:25pm

Thanks for the info. I for sure need to play with it more. I’m sure it was user error.

rbgreenway · May 24, 2023, 10:40pm

From what I can tell, Riva will not yet run gpu’s beyond the Ampere architecture (30 series). If true, Riva does not support the 40 series. Note in your log above:

ERROR: No supported GPU(s) detected to run this container

rvinobha · May 28, 2023, 8:28am

Thanks so much @rbgreenway and @jason.grey

Thanks for your kind inputs, Really appreciate

Sincere Apologies for the long delay

I will confirm

whether 40 series cards are supported by Riva
“”"
riva_start.sh - by adding “–gpus all” to the docker command which only had “–privileged” in it
“”"
I will share this feedback with internal team and get the inputs

Thanks

Topic		Replies	Views
Getting error while instialaizing riva Riva installation , riva	5	1552	June 6, 2022
Can´t start riva Riva	1	1186	April 5, 2022
Getting Error on command bash riva_init.sh Riva	10	1107	March 28, 2023
Triton server died before reaching ready state. Terminating Riva startup Riva	15	7745	November 8, 2023
Failed to get riva started Riva riva	7	1745	December 3, 2022
Riva_start.sh will not load the models Riva riva	3	1205	April 23, 2024
How can I start Riva without an error Riva riva	7	2565	September 29, 2021
Riva quickstart 2.11 fails on xavier nx Riva	3	930	June 29, 2023
Riva Quickstart 2.4.0 installation fails on AGX Orin Riva	12	1238	September 13, 2022
Riva Speech Skills initialisation - Error 'RivaTokenizer' object has no attribute Riva	3	735	May 5, 2022

Something wrong with riva quickstart

Related topics