Problems running TTS Es Multispeaker FastPitch HiFiGAN in RIVA

Please provide the following information when requesting support.

Riva Version riva_quickstart:2.8.1

Hi! I have downloaded TTS Es Multispeaker FastPitch HiFiGAN | NVIDIA NGC model but I am not able to make riva and tritonserver work.

This is what I have done for now:

  1. Create riva with Nemo docker (using the last nemo docker version, 22.09.)
docker run --gpus all -it --rm \
    -v $(pwd):/NeMo \
    --shm-size=8g \
    -p 8888:8888 \
    -p 6006:6006 \
    --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    --device=/dev/snd \
    nvcr.io/nvidia/nemo:22.09

Then installed nemo2riva python package from riva_quickstart_v2.8.2.

pip3 install "riva_quickstart_v2.8.1/nemo2riva-2.8.1-py3-none-any.whl"

And generate riva files:

nemo2riva --key tlt_encode --out tts_es_hifigan_ft_fastpitch_multispeaker.riva tts_es_hifigan_ft_fastpitch_multispeaker.nemo

Which works well, and

nemo2riva --key tlt_encode --out tts_es_fastpitch_multispeaker.riva tts_es_fastpitch_multispeaker.nemo

Where I started to have problems and updated versions of the next packages.

pip install hydra-core==1.2.0
pip install omegaconf==2.2.3
pip install pytorch-lightning==1.8.6
pip install nemo-toolkit==1.14.0

Updating these packages solve all the problems I had with nem2riva.

  1. I used servicemaker docker for riva-build and riva-deploy.
docker run --init -it --rm --gpus '"'"device=0"'"' \
  -v $(pwd)/riva_model_loc:/data \
  -e "MODEL_DEPLOY_KEY=tlt_encode" \
  --name riva-service-maker \
  nvcr.io/nvidia/riva/riva-speech:2.8.1-servicemaker 
riva-build speech_synthesis \
    tts_es_hifigan_ft_fastpitch_multispeaker.rmir:tlt_encode \
    tts_es_fastpitch_multispeaker.riva:tlt_encode \
    tts_es_hifigan_ft_fastpitch_multispeaker.riva:tlt_encode \
    --language_code es_US \
    --num_speakers=174 \
    --phone_set=ipa \
    --sample_rate 44100 \
    --voice_name Latin-American-Spanish \
    --subvoices es-AR-Female:0,es-AR-Male:32,es-CL-Female:44,es-CL-Male:57,es-CO-Female:75,es-CO-Male:91,es-PE-Female:108,es-PE-Male:126,es-VE-Female:151,es-VE-Male:162 \
    --wfst_tokenizer_model=tokenize_and_classify.far \
    --wfst_verbalizer_model=verbalize.far

I have downloaded the tokenizer and verbalizer from Riva ASR Spanish Inverse Normalization Grammar | NVIDIA NGC.

This works, I have created the rmir file and now I deploy.

riva-deploy -f tts_es_hifigan_ft_fastpitch_multispeaker.rmir:tlt_encode /data/models

Models have been created, being tts_es_fastpitch_multispeaker.riva in ONNX format because it cannot convert it to TensorRT.

  1. I have launched riva with these created models.
docker run  --init --rm -it \
--gpus '"'"device=0"'"' \
-p 50051:50051 \
-v $(pwd)/riva_model_loc:/data \
--name riva-speech \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
nvcr.io/nvidia/riva/riva-speech:2.8.1
start-riva  --riva-uri=0.0.0.0:50052 --nlp_service=false --asr_service=true --tts_service=true

But the server stops and doesn’t work. This is the error that I am seeing:

I0112 11:02:09.698625 12536 server.cc:633] 
+---------------------------------------------+---------+---------------------------------------------+
| Model                                       | Version | Status                                      |
+---------------------------------------------+---------+---------------------------------------------+
| riva-onnx-fastpitch_encoder-Latin-American- | 1       | UNAVAILABLE: Internal: onnx runtime error 1 |
| Spanish                                     |         | 0: Load model from /data/models/riva-onnx-f |
|                                             |         | astpitch_encoder-Latin-American-Spanish/1/m |
|                                             |         | odel.onnx failed:This is an invalid model.  |
|                                             |         | In Node, ("Identity_0", Identity, "Identity |
|                                             |         | _0", -1) : ("onnx::MatMul_3779": tensor(flo |
|                                             |         | at16),) -> ("onnx::MatMul_3978",) , Error N |
|                                             |         | o opset import for domain 'Identity_0'      |
|                                             |         |                                             |
| riva-trt-hifigan-Latin-American-Spanish     | 1       | READY                                       |
| spectrogram_chunker-Latin-American-Spanish  | 1       | READY                                       |
| tts_postprocessor-Latin-American-Spanish    | 1       | READY                                       |
| tts_preprocessor-Latin-American-Spanish     | 1       | READY                                       |
+---------------------------------------------+---------+---------------------------------------------+


Once converted Nemo model to ONNX the ONNX model uses an Identity_0 operation that tritonserver doesn’t like. How can I remove this opset operator when converting the model?

Can you give me a hand to see how to approach this?

Thank you very much!

Some updates,

I have built the newest Nemo Dockerfile from the repository and now I have the 1.15.0rc0 version for nemo-toolkit.

Here I see that I don’t have the Identity_0 problem because I am able to export to Onnx and check it with this code that for version 1.14 was failing.

import onnx
from nemo.collections.tts.models import FastPitchModel
spec_generator = FastPitchModel.restore_from("tts_es_fastpitch_multispeaker.nemo")
spec_generator.export("model.onnx")
model = onnx.load("model.onnx")
onnx.checker.check_model(model)

But now when I launch nemo2riva command I am not able to export ONNX model.

[C] Graph input and output tensors must include dtype information. Please set the dtype attribute for: Variable (536): (shape=None, dtype=None)
[NeMo E 2023-01-12 21:17:56 cookbook:124] ERROR: Export failed. Please make sure your NeMo model class (<class 'nemo.collections.tts.models.fastpitch.FastPitchModel'>) has working export() and that you have the latest NeMo package installed with [all] dependencies.
Traceback (most recent call last):
  File "/usr/local/bin/nemo2riva", line 8, in <module>
    sys.exit(nemo2riva())
  File "/usr/local/lib/python3.8/dist-packages/nemo2riva/cli/nemo2riva.py", line 49, in nemo2riva
    Nemo2Riva(args)
  File "/usr/local/lib/python3.8/dist-packages/nemo2riva/convert.py", line 83, in Nemo2Riva
    export_model(
  File "/usr/local/lib/python3.8/dist-packages/nemo2riva/cookbook.py", line 125, in export_model
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nemo2riva/cookbook.py", line 95, in export_model
    model_onnx = gs.export_onnx(graph)
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 139, in export_onnx
    onnx_graph = OnnxExporter.export_graph(graph, do_type_check=do_type_check)
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 92, in export_graph
    nodes = [OnnxExporter.export_node(node, do_type_check) for node in graph.nodes]
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 92, in <listcomp>
    nodes = [OnnxExporter.export_node(node, do_type_check) for node in graph.nodes]
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 72, in export_node
    val = OnnxExporter.export_graph(val, do_type_check)
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 92, in export_graph
    nodes = [OnnxExporter.export_node(node, do_type_check) for node in graph.nodes]
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 92, in <listcomp>
    nodes = [OnnxExporter.export_node(node, do_type_check) for node in graph.nodes]
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 72, in export_node
    val = OnnxExporter.export_graph(val, do_type_check)
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 94, in export_graph
    outputs = [OnnxExporter.export_value_info_proto(out, do_type_check) for out in graph.outputs]
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 94, in <listcomp>
    outputs = [OnnxExporter.export_value_info_proto(out, do_type_check) for out in graph.outputs]
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/exporters/onnx_exporter.py", line 49, in export_value_info_proto
    G_LOGGER.critical(
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/logger/logger.py", line 233, in critical
    raise OnnxGraphSurgeonException(message) from None  # Erase exception chain
onnx_graphsurgeon.util.exception.OnnxGraphSurgeonException: Graph input and output tensors must include dtype information. Please set the dtype attribute for: Variable (536): (shape=None, dtype=None)

Hi @jlamperez10

Apologies for the delay,

There are indeed some problems with nemo2riva conversion

Can you try the nemo2riva conversion using the below and after that use the generated riva model and let us know if you face issues with riva-deploy and riva-build

Build the Docker

git clone https://github.com/NVIDIA/NeMo.git -b r1.15.0 
cd  NeMo
DOCKER_BUILDKIT=1 nvidia-docker build . -t nemo15

Start the Container (Before starting make sure to add the nemo model and nemo2riva-2.8.1-py3-none-any.whl wheel file from quickstart into the volume i.e <nemo_github_folder> of below command )

docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g \
-p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
stack=67108864 --device=/dev/snd nemo15

Then do the conversion

pip install nemo2riva-2.8.1-py3-none-any.whl

nemo2riva --key tlt_encode --out /NeMo/tts_es_fastpitch_multispeaker.riva /NeMo/tts_es_fastpitch_multispeaker.nemo

And let us know using this riva model does cause trouble with riva-deploy and riva-build command,

Thanks

Hi @rvinobha

Thank you for your help, I have been able to compile the riva and rmir model with this new NeMo release and launch riva server but I am having the next problem when synthesizing text to speach.

I have used,

riva-build speech_synthesis \
     tts_es_hifigan_ft_fastpitch_multispeaker.rmir:tlt_encode\
     tts_es_fastpitch_multispeaker.riva:tlt_encode \
     tts_es_hifigan_ft_fastpitch_multispeaker.riva:tlt_encode \
     --language_code es_US \
     --num_speakers=174 \
     --phone_set=ipa \
     --sample_rate 44100 \
     --voice_name Latin-American-Spanish \
     --subvoices esARFemale-1:0,esARMale-1:32,esCLFemale-1:44,esCLMale-1:57,esCOFemale-1:75,esCOMale-1:91,esPEFemale-1:108,esPEMale-1:126,esVEFemale-1:151,esVEMale-1:162 \
     --wfst_tokenizer_model=tokenize_and_classify.far \
     --wfst_verbalizer_model=verbalize.far

For creating rmir and now when I make requests to the server with this code in python:

import riva.client
uri = "localhost:50052"
auth = riva.client.Auth(uri=uri)
tts_service = riva.client.SpeechSynthesisService(auth)
language_code = 'es_US'
sample_rate_hz = 44100
text = ( "Hola como estas?")
resp = tts_service.synthesize_online(text, language_code=language_code, sample_rate_hz=sample_rate_hz)

I’ve got an invalid server subvoice configuration.

<_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.INVALID_ARGUMENT
	details = "Model is not available on server: invalid server subvoice configuration"
	debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:50052 {created_time:"2023-01-20T16:59:52.220356857+01:00", grpc_status:3, grpc_message:"Model is not available on server: invalid server subvoice configuration"}"

I am missing something in the riva-build? How should I define different voices and speakers?

I have removed the --subvoices parameters so now servicemaker maps the subvoices as integers and I am able to generate and listen the audio.

From command prompt I am seeing this,

Speech Class far file missing:/data/models/tts_preprocessor-Spanish-LA/1/speech_class.far

Where can I find this far file? Or how can I generate it?

Hi @jlamperez10

Apologies for the delay,

I will check regarding this error with the internal team and provide updates

Thanks

Hi @jlamperez10

Apologies for the delay,

I have answers for the team,

We don’t supply pre-built text normalization for non-en-US language. You are welcome to either not use text normalization or build it themselves from GitHub - NVIDIA/NeMo-text-processing: NeMo text processing for ASR and TTS

Thanks