I am trying to use custom pronunciations in TTS spanish multispeaker through an IPA phoneme dictionary that I pass as an argument (–phone_dictionary_file) when creating the . rmir file. When trying to synthesize audio nothing is generated and in the docker logs I have that it was not found in the dictionary:
E1011 00:11:01.802628 207 character_mapping.cc:264] Could not find '@t' in the dictionary
E1011 00:11:01.802636 207 character_mapping.cc:264] Could not find '@i' in the dictionary
E1011 00:11:01.802641 207 character_mapping.cc:264] Could not find '@θ' in the dictionary
E1011 00:11:01.802647 207 character_mapping.cc:264] Could not find '@ˈ' in the dictionary
E1011 00:11:01.802656 207 character_mapping.cc:264] Could not find '@i' in the dictionary
E1011 00:11:01.802662 207 character_mapping.cc:264] Could not find '@n' in the dictionary
E1011 00:11:01.802672 207 character_mapping.cc:264] Could not find '@k' in the dictionary
E1011 00:11:01.802678 207 character_mapping.cc:264] Could not find '@o' in the dictionary
E1011 00:11:01.802688 207 character_mapping.cc:264] Could not find '@m' in the dictionary
E1011 00:11:01.802695 207 character_mapping.cc:264] Could not find '@ˈ' in the dictionary
E1011 00:11:01.802703 207 character_mapping.cc:264] Could not find '@e' in the dictionary
E1011 00:11:01.802709 207 character_mapping.cc:264] Could not find '@ɣ' in the dictionary
E1011 00:11:01.802714 207 character_mapping.cc:264] Could not find '@a' in the dictionary
E1011 00:11:01.802724 207 character_mapping.cc:264] Could not find '@s' in the dictionary
E1011 00:11:01.802731 207 character_mapping.cc:264] Could not find '@ˈ' in the dictionary
E1011 00:11:01.802736 207 character_mapping.cc:264] Could not find '@a' in the dictionary
E1011 00:11:01.802742 207 character_mapping.cc:264] Could not find '@ɪ' in the dictionary
E1011 00:11:01.802747 207 character_mapping.cc:264] Could not find '@f' in the dictionary
E1011 00:11:01.802757 207 character_mapping.cc:264] Could not find '@o' in the dictionary
E1011 00:11:01.802762 207 character_mapping.cc:264] Could not find '@n' in the dictionary
This is the command I am trying to use:
riva-build speech_synthesis -f \
/servicemaker-dev/tts_rmir:tlt_encode \
/servicemaker-dev/tts_es_fastpitch.riva:None \
/servicemaker-dev/tts_es_hifigan_fastpitch.riva:None\
--sample_rate=44100 \
--phone_set="ipa" \
--phone_dictionary_file=/servicemaker-dev/es_LA_nv230301.dict \
--upper_case_chars=True \
--voice_name=tts_spanish \
--wfst_tokenizer_model=/servicemaker-dev/new/tokenize_and_classify.far \
--wfst_verbalizer_model=/servicemaker-dev/new/verbalize.far
I used three different dictionaries, 2 from NeMo and 1 from Riva.
It doesn’t work with any of the 3 dictionaries.
These are the logs when creating the .rmir file and deploying the models.
2023-10-11 00:01:59,599 [INFO] Packing binaries for preprocessor/ONNX : {'phone_dictionary_path': ('nemo.collections.tts.models.fastpitch.FastPitchModel', '/servicemaker-dev/es_ES_nv230301.dict'), 'mapping_file': ('nemo.collections.tts.models.fastpitch.FastPitchModel', 'mapping.txt'), 'wfst_tokenizer': '/servicemaker-dev/new/tokenize_and_classify.far', 'wfst_verbalizer': '/servicemaker-dev/new/verbalize.far'}
2023-10-11 00:01:59,599 [INFO] Copying phone_dictionary_path:/servicemaker-dev/es_ES_nv230301.dict -> preprocessor:preprocessor-es_ES_nv230301.dict
2023-10-11 00:01:59,602 [INFO] Copying mapping_file:mapping.txt -> preprocessor:preprocessor-mapping.txt
2023-10-11 00:01:59,602 [INFO] Copying wfst_tokenizer:/servicemaker-dev/new/tokenize_and_classify.far -> preprocessor:preprocessor-tokenize_and_classify.far
2023-10-11 00:01:59,642 [INFO] Copying wfst_verbalizer:/servicemaker-dev/new/verbalize.far -> preprocessor:preprocessor-verbalize.far
2023-10-11 00:01:59,673 [INFO] Packing binaries for encoderFastPitch/ONNX : {'onnx': ('nemo.collections.tts.models.fastpitch.FastPitchModel', 'model_graph.onnx')}
2023-10-11 00:01:59,673 [INFO] Copying onnx:model_graph.onnx -> encoderFastPitch:encoderFastPitch-model_graph.onnx
2023-10-11 00:02:00,543 [INFO] Packing binaries for hifigan/ONNX : {'onnx': ('nemo.collections.tts.models.hifigan.HifiGanModel', 'model_graph.onnx')}
2023-10-11 00:02:00,543 [INFO] Copying onnx:model_graph.onnx -> hifigan:hifigan-model_graph.onnx
2023-10-11 00:02:01,036 [INFO] Saving to /servicemaker-dev/tts_rmir
root@00dc5507884e:/opt/riva# riva-deploy /servicemaker-dev/tts_rmir:tlt_encode /data/models
2023-10-11 00:02:29,843 [INFO] Writing Riva model repository to '/data/models'...
2023-10-11 00:02:29,843 [INFO] The riva model repo target directory is /data/models
2023-10-11 00:02:31,613 [INFO] Using onnx runtime
2023-10-11 00:02:31,613 [INFO] Using tensorrt with fp16
2023-10-11 00:02:31,614 [INFO] Extract_binaries for preprocessor -> /data/models/tts_preprocessor-tts_spanish/1
2023-10-11 00:02:31,614 [INFO] extracting {'phone_dictionary_path': ('nemo.collections.tts.models.fastpitch.FastPitchModel', '/servicemaker-dev/es_ES_nv230301.dict'), 'mapping_file': ('nemo.collections.tts.models.fastpitch.FastPitchModel', 'mapping.txt'), 'wfst_tokenizer': '/servicemaker-dev/new/tokenize_and_classify.far', 'wfst_verbalizer': '/servicemaker-dev/new/verbalize.far'} -> /data/models/tts_preprocessor-tts_spanish/1
2023-10-11 00:02:31,664 [INFO] Extract_binaries for encoderFastPitch -> /data/models/riva-onnx-fastpitch_encoder-tts_spanish/1
2023-10-11 00:02:31,664 [INFO] extracting {'onnx': ('nemo.collections.tts.models.fastpitch.FastPitchModel', 'model_graph.onnx')} -> /data/models/riva-onnx-fastpitch_encoder-tts_spanish/1
2023-10-11 00:02:32,406 [INFO] Printing copied artifacts:
2023-10-11 00:02:32,406 [INFO] {'onnx': '/data/models/riva-onnx-fastpitch_encoder-tts_spanish/1/model_graph.onnx'}
2023-10-11 00:02:32,446 [INFO] Extract_binaries for chunkerFastPitch -> /data/models/spectrogram_chunker-tts_spanish/1
2023-10-11 00:02:32,447 [INFO] Extract_binaries for hifigan -> /data/models/riva-trt-hifigan-tts_spanish/1
2023-10-11 00:02:32,447 [INFO] extracting {'onnx': ('nemo.collections.tts.models.hifigan.HifiGanModel', 'model_graph.onnx')} -> /data/models/riva-trt-hifigan-tts_spanish/1
2023-10-11 00:02:32,922 [INFO] Printing copied artifacts:
2023-10-11 00:02:32,922 [INFO] {'onnx': '/data/models/riva-trt-hifigan-tts_spanish/1/model_graph.onnx'}
2023-10-11 00:02:32,922 [INFO] Building TRT engine from ONNX file {model_weights}
[10/11/2023-00:02:35] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[10/11/2023-00:02:35] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/11/2023-00:02:37] [TRT] [E] parsers/onnx/ModelImporter.cpp:520: Parse was called with a non-empty network definition
[10/11/2023-00:07:00] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[10/11/2023-00:07:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[10/11/2023-00:07:00] [TRT] [W] Check verbose logs for the list of affected weights.
[10/11/2023-00:07:00] [TRT] [W] - 92 weights are affected by this issue: Detected subnormal FP16 values.
[10/11/2023-00:07:00] [TRT] [W] - 27 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
2023-10-11 00:07:00,536 [INFO] Writing engine to model repository: /data/models/riva-trt-hifigan-tts_spanish/1/model.plan
2023-10-11 00:07:00,562 [INFO] Extract_binaries for denoiser -> /data/models/tts_postprocessor-tts_spanish/1
2023-10-11 00:07:00,563 [INFO] Extract_binaries for self -> /data/models/fastpitch_hifigan_ensemble-tts_spanish/1
If I don’t set up a phoneme dictionary it works correctly.
Does anyone know how to use custom pronunciations?