Installation issues re: Riva 2.5 (ubuntu 22.04 & CentOS8)

NSDB · September 2, 2022, 10:58pm

Please provide the following information when requesting support.

Hardware - GPU (A100)
Hardware - CPU
Operating System Ubuntu 22.04 and Centos8 - same results from each installation attempt.
Riva Version 2.5
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

my installation process:

sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo

sudo dnf repolist -v

sudo dnf install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.1.el7.x86_64.rpm

sudo dnf install docker-ce -y

sudo systemctl --now enable docker

]# sudo docker run --rm hello-world


Unable to find image 'hello-world:latest' locally

latest: Pulling from library/hello-world

2db29710123e: Pull complete 

Digest: sha256:7d246653d0511db2a6b2e0436cfd0e52ac8c066000264b3ce63331ac66dca625

Status: Downloaded newer image for hello-world:latest



Hello from Docker!

This message shows that your installation appears to be working correctly.



To generate this message, Docker took the following steps:

 1. The Docker client contacted the Docker daemon.

 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.

    (amd64)

 3. The Docker daemon created a new container from that image which runs the

    executable that produces the output you are currently reading.

 4. The Docker daemon streamed that output to the Docker client, which sent it

    to your terminal.



To try something more ambitious, you can run an Ubuntu container with:

 $ docker run -it ubuntu bash



Share images, automate workflows, and more with a free Docker ID:

 https://hub.docker.com/



For more examples and ideas, visit:

 https://docs.docker.com/get-started/

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

yum-config-manager --enable libnvidia-container-experimental

sudo dnf clean expire-cache --refresh

sudo dnf install -y nvidia-docker2

sudo systemctl restart docker

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

]# sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi


Unable to find image 'nvidia/cuda:11.0.3-base-ubuntu20.04' locally

11.0.3-base-ubuntu20.04: Pulling from nvidia/cuda

d7bfe07ed847: Pull complete 

75eccf561042: Pull complete 

191419884744: Pull complete 

a17a942db7e1: Pull complete 

16156c70987f: Pull complete 

Digest: sha256:57455121f3393b7ed9e5a0bc2b046f57ee7187ea9ec562a7d17bf8c97174040d

Status: Downloaded newer image for nvidia/cuda:11.0.3-base-ubuntu20.04

Fri Sep  2 21:23:56 2022       

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|===============================+======================+======================|

|   0  GRID A100D-2-20C    On   | 00000000:04:00.0 Off |                   On |

| N/A   N/A    P0    N/A /  N/A |                  N/A |     N/A      Default |

|                               |                      |              Enabled |

+-------------------------------+----------------------+----------------------+



+-----------------------------------------------------------------------------+

| MIG devices:                                                                |

+------------------+----------------------+-----------+-----------------------+

| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |

|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|

|                  |                      |        ECC|                       |

|==================+======================+===========+=======================|

|  0    0   0   0  |      0MiB / 18411MiB | 28      0 |  2   0    1    0    0 |

|                  |      0MiB /  4096MiB |           |                       |

+------------------+----------------------+-----------+-----------------------+

                                                                               

+-----------------------------------------------------------------------------+

| Processes:                                                                  |

|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |

|        ID   ID                                                   Usage      |

|=============================================================================|

|  No running processes found                                                 |

+-----------------------------------------------------------------------------+

wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_linux.zip && unzip ngccli_linux.zip && chmod u+x ngc-cli/ngc

echo “export PATH="$PATH:$(pwd)/ngc-cli"” >> ~/.bash_profile && source ~/.bash_profile

ngc config set

Successfully saved NGC configuration to /root/.ngc/config

ngc registry resource download-version nvidia/riva/riva_quickstart:2.5.0

cd riva_quickstart_v2.5.0

bash riva_init.sh

2022-09-02 21:34:00,957 [INFO] Extract_binaries for conformer-en-US-asr-offline -> /data/models/conformer-en-US-asr-offline/1

2022-09-02 21:34:00,957 [INFO] extracting {'wfst_tokenizer': '/mnt/nvdl/datasets/jarvis_speech_ci/model_files/sp-itn/22.05/en/tokenize_and_classify.far', 'wfst_verbalizer': '/mnt/nvdl/datasets/jarvis_speech_ci/model_files/sp-itn/22.05/en/verbalize.far'} -> /data/models/conformer-en-US-asr-offline/1

2022-09-02 21:34:04,468 [INFO] Using onnx runtime

2022-09-02 21:34:04,468 [INFO] Extract_binaries for language_model -> /data/models/riva-onnx-riva-punctuation-en-US-nn-bert-base-uncased/1

2022-09-02 21:34:04,468 [INFO] extracting {'ckpt': ('nemo.collections.nlp.models.token_classification.punctuation_capitalization_model.PunctuationCapitalizationModel', 'model_weights.ckpt'), 'bert_config_file': ('nemo.collections.nlp.models.token_classification.punctuation_capitalization_model.PunctuationCapitalizationModel', 'bert-base-uncased_encoder_config.json')} -> /data/models/riva-onnx-riva-punctuation-en-US-nn-bert-base-uncased/1

2022-09-02 21:34:09,076 [INFO] Printing copied artifacts:

2022-09-02 21:34:09,077 [INFO] {'ckpt': '/data/models/riva-onnx-riva-punctuation-en-US-nn-bert-base-uncased/1/model_weights.ckpt', 'bert_config_file': '/data/models/riva-onnx-riva-punctuation-en-US-nn-bert-base-uncased/1/bert-base-uncased_encoder_config.json'}

2022-09-02 21:34:09,077 [ERROR] Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/servicemaker/cli/deploy.py", line 100, in deploy_from_rmir

    generator.serialize_to_disk(

  File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 434, in serialize_to_disk

    module.serialize_to_disk(repo_dir, rmir, config_only, verbose, overwrite)

  File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 313, in serialize_to_disk

    self.generate_config(version_dir, rmir)

  File "/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py", line 352, in generate_config

    input=self._inputs,

AttributeError: 'RivaBertEncoder' object has no attribute '_inputs'



+ '[' 1 -ne 0 ']'

+ echo 'Error in deploying RMIR models.'

Error in deploying RMIR models.

+ exit 1

I am seeing this same results when installing RIVA 2.5 on fresh Ubuntu 22.04 and CentOS8 servers.

What am I missing?

Please advise. Thank you , kindly.

rvinobha · September 6, 2022, 12:34pm

Hi @NSDB

Thanks for your interest in Riva

I will check regarding this error with my team,

Quick suggestions, these kind of error happen if old versions of Riva are present,
Can we once try
bash riva_clean.sh
and start the setup again

Thanks

NSDB · September 11, 2022, 10:23pm

Thank you for the reply.

Quick suggestions, these kind of error happen if old versions of Riva are present,

this was a clean / fresh install on both platforms.

Can we once try
bash riva_clean.sh
and start the setup again

No change.