RIVA Non-reproduicable ASR outputs compared to NeMo model

Please provide the following information when requesting support.

Hardware - GPU RTX 3090
Hardware - CPU AMD EPYC 7502
Operating System Ubuntu 22.04
Riva Version v2.14.0

How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

Hello, my goal is deploy custom ASR model trained on domain-specific data with help of RIVA.

I can run inference with built model, though ASR results are much worse then those I obtain in plain NeMo inference scripts.

I noticed that quality is affected significantly by

  --chunk_size=4.8 \
  --left_padding_size=0.0 \
  --right_padding_size=0.0 \

Do you have any suggestions on how riva-build arguments list should look like to get ASR outputs closest to those I have in NeMo?

I’ll be happy to obtain the same greedy outputs equivalent to NeMo’s to start with.

Here is a full list of my riva-build arguments.

riva-build speech_recognition \
  /data/rmir/asr2.rmir \
  /data/conformer-finetune-inhouse-more-data-golos-best.riva \
  --offline \
  --name=conformer-ru-RU-asr-offline \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=0.0 \
  --right_padding_size=0.0 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ru-RU

Thank you. Vic

I’m having the same problem as you. I fine-tuned the asr pt-br model (conformer-ctc) and its performance was very good in .nemo, but, after deploying, its performance dropped a lot (it was very bad).

Any news with about your problem?

Same problem here! With pt-br model too