Please provide the following information when requesting support.
Hardware - GPU - RTX 3060 12GB
Hardware - CPU - AMD Ryzen 3600
Operating System - Ubuntu 22.04
Riva Version - 2.11
TLT Version (if relevant)
When configuring a model with riva-build for offline ASR, and specifying --streaming=false, no responses are returned when using RecognizeRequest. When --streaming=true, it works fine. I need to disable streaming to reduce GPU memory footprint. I’ve tried with the Conformer-CTC Large and XL models with the same result.
Config (for Large model):
riva-build speech_recognition \
conformer_offline.rmir:tlt_encode Conformer-CTC-L-en-US-ASR-set-4p0.riva:tlt_encode \
--offline \
--streaming=False \
--name=conformer-en-US-asr-offline \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=lm.binary \
--decoding_vocab=vocab.txt \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--language_code=en-US \
--wfst_tokenizer_model=tokenize_and_classify.far \
--wfst_verbalizer_model=verbalize.far \
--force