Silero VAD Integration with custom ASR model using riva-build in Riva 2.17.0

I’m trying to deploy a custom Arabic ASR model (converted from NeMo .nemo to .riva) with Silero VAD v5 integration using riva-build speech_recognition in Riva 2.17.0, but I’m getting a “Model class silerovad not recognized” error.

What I’m trying to achieve:
Deploy my custom ASR model with neural VAD-based endpointing instead of decoder-based (greedy_ctc) endpointing.

What I’ve tried:

Questions:

  1. Does riva-build speech_recognition support integrating an external VAD model file directly, or must VAD be deployed separately using Riva Quick Start’s asr_accessory_models approach?

  2. If riva-build supports it, what’s the correct syntax to specify the VAD model file?

  3. If not, can I deploy my custom .rmir alongside Quick Start’s Silero VAD accessory model?

  4. How can I deploy model with VAD endpointing?

Any guidance would be greatly appreciated!

Hi @Vaneeza_Ahmad

Can you share your full riva-build command for which you get Model class silerovad not recognized

Also, can you try with Riva 2.19.0

@mehadi.hasan
I’ve tried Riva version 2.14.0 and 2.17.0 but I get the same issue in both. And I ran out of disk space, so I didn’t try version 2.19.0.
Let me know if there’s any way to handle this within this version, otherwise I can give 2.19.0 version a try.

Here’s the full riva-build command:

docker run --rm --gpus all \

  -v "$(pwd)/models:/models" \

  nvcr.io/nvidia/riva/riva-speech:2.17.0-servicemaker \

  riva-build speech_recognition \

  /models/output/quran_asr_v3_2_vad.rmir:tlt_encode \

  /models/V3_2_ASR_ReTainQuran_vad.riva:tlt_encode \

  /models/silero_vad.riva:tlt_encode \

  --force \

  --name=$MODEL_NAME \

  --streaming=true \

  --chunk_size=0.16 \

  --left_padding_size=1.92 \

  --right_padding_size=1.92 \

  --ms_per_timestep=80 \

  --decoder_type=greedy \

  --vad_type=silero \

  --neural_vad_nn.optimization_graph_level=-1 \

  --neural_vad.filter_speech_first 0 \

  --neural_vad.onset=0.85 \

  --neural_vad.offset=0.3 \

  --neural_vad.min_duration_on=0.2 \

  --neural_vad.min_duration_off=0.5 \

  --neural_vad.pad_offset=0.08 \

  --neural_vad.pad_onset=0.3 \

  --language_code=ar-AE

@Vaneeza_Ahmad note that support for SileroVAD was added with Riva 2.18.0, see Riva Quickstart (Riva SDK) documentation (see Release Notes — NVIDIA Riva for details).

Note also, that the Riva Quickstart (Riva SDK) is already at version 2.24.0, and in this form supported only on NVIDIA Jetson Thor (Support Matrix — NVIDIA Riva).

In parallel, a lot of work has been done with Riva ASR NIM, which is the current way to go for datacenter deployments. There’s an awesome pack of documentation available for it, including instructions on deploying custom models as NIM ( Deploying Custom Models as NIM — NVIDIA NIM Riva ASR ). Release Notes — NVIDIA NIM Riva ASR and Support Matrix — NVIDIA NIM Riva ASR are also available. FYI. internally NIM is still based on Riva, hence the build/deploy steps are similar or identical to Riva SDK, but Riva versions may differ among NIM models, with most recent once using Riva at version 2.24.0.

Riva 2.19.0 should solve your issue

Yes, switching to Riva 2.18.0 and Riva 2.19.0 both resolved the VAD issue. Thank you! @mehadi.hasan and @ilb

After that, I got tensor shape mismatch error when building the TensorRT from ONNX. As a workaround, I’m currently using ONNX Runtime instead of TensorRT, but I would prefer TensorRT to achieve lower latency for real-time ASR. I’ve described it here: TensorRT Build Failure with Silero VAD ONNX Export from NeMo for Riva 2.18.0 (Int32/Int64 Mismatch) - Deep Learning (Training & Inference) / Riva - NVIDIA Developer Forums

Also, I think that Nvidia team should update docs for VAD pipeline configuration, if its only supported by some Riva versions and not all.

Anyways, thanks again for the help. I really appreciate it.