Dictionary of subvoices for the TTS multi-speaker spanish model?

I Deploy this multi-speaker spanish model in Riva.

In the documentation of the model it is detailed that there are 174 different subvoices, however, I cannot find information or a dictionary that details the accents of all the subvoices, is there any documentation of the subvoices?

Hi @nharo

Thanks for your interest in Riva

I will check with the internal team and provide the details


1 Like

Hi @nharo, I am also trying to use this model. How did you compile and deployed tts_es_fastpitch_multispeaker.nemo?

I am able to obtain a riva model with nemo2riva --key tlt_encode --out tts_es_fastpitch_multispeaker.riva tts_es_fastpitch_multispeaker.nemo but then this model does not compile to TensorRT because it is not able to do the ONNX2TRT I am seeing that there are 64 int not supported layers in the ONNX model.

I have also tried to isolate the problem obtaining the ONNX model and then generating the engine out of Riva development with

>>> from nemo.collections.tts.models import FastPitchModel
>>> spec_generator = FastPitchModel.restore_from("tts_es_fastpitch_multispeaker.nemo")
>>> spec_generator.export("model.onnx")
trtexec --onnx=model.onnx --saveEngine=model.plan

Giving me this error

trtexec --onnx=model.onnx --saveEngine=model.plan
&&&& RUNNING TensorRT.trtexec [TensorRT v8500] # trtexec --onnx=model.onnx --saveEngine=model.plan
[01/09/2023-11:10:43] [I] === Model Options ===
[01/09/2023-11:10:43] [I] Format: ONNX
[01/09/2023-11:10:43] [I] Model: model.onnx
[01/09/2023-11:10:43] [I] Output:
[01/09/2023-11:10:43] [I] === Build Options ===
[01/09/2023-11:10:43] [I] Max batch: explicit batch
[01/09/2023-11:10:43] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[01/09/2023-11:10:43] [I] minTiming: 1
[01/09/2023-11:10:43] [I] avgTiming: 8
[01/09/2023-11:10:43] [I] Precision: FP32
[01/09/2023-11:10:43] [I] LayerPrecisions: 
[01/09/2023-11:10:43] [I] Calibration: 
[01/09/2023-11:10:43] [I] Refit: Disabled
[01/09/2023-11:10:43] [I] Sparsity: Disabled
[01/09/2023-11:10:43] [I] Safe mode: Disabled
[01/09/2023-11:10:43] [I] DirectIO mode: Disabled
[01/09/2023-11:10:43] [I] Restricted mode: Disabled
[01/09/2023-11:10:43] [I] Build only: Disabled
[01/09/2023-11:10:43] [I] Save engine: model.plan
[01/09/2023-11:10:43] [I] Load engine: 
[01/09/2023-11:10:43] [I] Profiling verbosity: 0
[01/09/2023-11:10:43] [I] Tactic sources: Using default tactic sources
[01/09/2023-11:10:43] [I] timingCacheMode: local
[01/09/2023-11:10:43] [I] timingCacheFile: 
[01/09/2023-11:10:43] [I] Heuristic: Disabled
[01/09/2023-11:10:43] [I] Preview Features: Use default preview flags.
[01/09/2023-11:10:43] [I] Input(s)s format: fp32:CHW
[01/09/2023-11:10:43] [I] Output(s)s format: fp32:CHW
[01/09/2023-11:10:43] [I] Input build shapes: model
[01/09/2023-11:10:43] [I] Input calibration shapes: model
[01/09/2023-11:10:43] [I] === System Options ===
[01/09/2023-11:10:43] [I] Device: 0
[01/09/2023-11:10:43] [I] DLACore: 
[01/09/2023-11:10:43] [I] Plugins:
[01/09/2023-11:10:43] [I] === Inference Options ===
[01/09/2023-11:10:43] [I] Batch: Explicit
[01/09/2023-11:10:43] [I] Input inference shapes: model
[01/09/2023-11:10:43] [I] Iterations: 10
[01/09/2023-11:10:43] [I] Duration: 3s (+ 200ms warm up)
[01/09/2023-11:10:43] [I] Sleep time: 0ms
[01/09/2023-11:10:43] [I] Idle time: 0ms
[01/09/2023-11:10:43] [I] Streams: 1
[01/09/2023-11:10:43] [I] ExposeDMA: Disabled
[01/09/2023-11:10:43] [I] Data transfers: Enabled
[01/09/2023-11:10:43] [I] Spin-wait: Disabled
[01/09/2023-11:10:43] [I] Multithreading: Disabled
[01/09/2023-11:10:43] [I] CUDA Graph: Disabled
[01/09/2023-11:10:43] [I] Separate profiling: Disabled
[01/09/2023-11:10:43] [I] Time Deserialize: Disabled
[01/09/2023-11:10:43] [I] Time Refit: Disabled
[01/09/2023-11:10:43] [I] NVTX verbosity: 0
[01/09/2023-11:10:43] [I] Persistent Cache Ratio: 0
[01/09/2023-11:10:43] [I] Inputs:
[01/09/2023-11:10:43] [I] === Reporting Options ===
[01/09/2023-11:10:43] [I] Verbose: Disabled
[01/09/2023-11:10:43] [I] Averages: 10 inferences
[01/09/2023-11:10:43] [I] Percentiles: 90,95,99
[01/09/2023-11:10:43] [I] Dump refittable layers:Disabled
[01/09/2023-11:10:43] [I] Dump output: Disabled
[01/09/2023-11:10:43] [I] Profile: Disabled
[01/09/2023-11:10:43] [I] Export timing to JSON file: 
[01/09/2023-11:10:43] [I] Export output to JSON file: 
[01/09/2023-11:10:43] [I] Export profile to JSON file: 
[01/09/2023-11:10:43] [I] 
[01/09/2023-11:10:43] [I] === Device Information ===
[01/09/2023-11:10:43] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti
[01/09/2023-11:10:43] [I] Compute Capability: 7.5
[01/09/2023-11:10:43] [I] SMs: 68
[01/09/2023-11:10:43] [I] Compute Clock Rate: 1.545 GHz
[01/09/2023-11:10:43] [I] Device Global Memory: 11016 MiB
[01/09/2023-11:10:43] [I] Shared Memory per SM: 64 KiB
[01/09/2023-11:10:43] [I] Memory Bus Width: 352 bits (ECC disabled)
[01/09/2023-11:10:43] [I] Memory Clock Rate: 7 GHz
[01/09/2023-11:10:43] [I] 
[01/09/2023-11:10:43] [I] TensorRT version: 8.5.0
[01/09/2023-11:10:43] [I] [TRT] [MemUsageChange] Init CUDA: CPU +304, GPU +0, now: CPU 317, GPU 3209 (MiB)
[01/09/2023-11:10:44] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +260, GPU +74, now: CPU 629, GPU 3279 (MiB)
[01/09/2023-11:10:44] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[01/09/2023-11:10:44] [I] Start parsing network model
[01/09/2023-11:10:44] [I] [TRT] ----------------------------------------------------------------
[01/09/2023-11:10:44] [I] [TRT] Input filename:   model.onnx
[01/09/2023-11:10:44] [I] [TRT] ONNX IR version:  0.0.7
[01/09/2023-11:10:44] [I] [TRT] Opset version:    13
[01/09/2023-11:10:44] [I] [TRT] Producer name:    pytorch
[01/09/2023-11:10:44] [I] [TRT] Producer version: 1.13.0
[01/09/2023-11:10:44] [I] [TRT] Domain:           
[01/09/2023-11:10:44] [I] [TRT] Model version:    0
[01/09/2023-11:10:44] [I] [TRT] Doc string:       
[01/09/2023-11:10:44] [I] [TRT] ----------------------------------------------------------------
[01/09/2023-11:10:45] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/09/2023-11:10:47] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[01/09/2023-11:10:47] [E] Error[2]: [shapeContext.cpp::setShapeInterval::427] Error Code 2: Internal Error (Assertion success failed. intervals already set for the shape)
[01/09/2023-11:10:47] [E] [TRT] parsers/onnx/ModelImporter.cpp:740: While parsing node number 1307 [Range -> "onnx::Cast_1725"]:
[01/09/2023-11:10:47] [E] [TRT] parsers/onnx/ModelImporter.cpp:741: --- Begin node ---
[01/09/2023-11:10:47] [E] [TRT] parsers/onnx/ModelImporter.cpp:742: input: "onnx::Range_1723"
input: "onnx::Range_1722"
input: "onnx::Range_1724"
output: "onnx::Cast_1725"
name: "Range_1307"
op_type: "Range"

[01/09/2023-11:10:47] [E] [TRT] parsers/onnx/ModelImporter.cpp:743: --- End node ---
[01/09/2023-11:10:47] [E] [TRT] parsers/onnx/ModelImporter.cpp:745: ERROR: parsers/onnx/ModelImporter.cpp:199 In function parseGraph:
[6] Invalid Node - Range_1307
[shapeContext.cpp::setShapeInterval::427] Error Code 2: Internal Error (Assertion success failed. intervals already set for the shape)
[01/09/2023-11:10:47] [E] Failed to parse onnx file
[01/09/2023-11:10:47] [I] Finish parsing network model
[01/09/2023-11:10:47] [E] Parsing model failed
[01/09/2023-11:10:47] [E] Failed to create engine from model or file.
[01/09/2023-11:10:47] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8500] # trtexec --onnx=model.onnx --saveEngine=model.plan

Is there any *.rmir file for the TTS ES Multispeaker?