I wanna try to test fastconformer en-Us model in riva and have some problems with deploy.
Steps to reproduce:
1.install nvidia-riva 2.19
2.download model GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC | NVIDIA NGC
3.extract rnnt head via convert_nemo_asr_hybrid_to_ctc.py script from nemo2riva examples
python convert_nemo_asr_hybrid_to_ctc.py --input stt_en_fastconformer_hybrid_large_streaming_80ms.nemo --output conv_stt_en_fastconformer_hybrid_large_streaming_80ms.nemo --model_type=rnnt
4.convert model to riva
nemo2riva --key tlt_encode --out conv_stt_en_fastconformer_hybrid_large_streaming_80ms.riva --format nemo conv_stt_en_fastconformer_hybrid_large_streaming_80ms.nemo
5.Then trying to deploy via riva build in riva docker container:
riva-build speech_recognition conv_stt_en_fastconformer_hybrid_large_streaming_80ms.rmir:tlt_encode converted_stt_en_fastconformer_hybrid_large_streaming_80ms.riva
--name=usasr --decoder_type=nemo --ms_per_timestep=80 --chunk_size=0.16 --padding_size=0.08 --language_code=en-US
[TensorRT-LLM] TensorRT-LLM version: 0.17.0
Traceback (most recent call last):
File "/usr/local/bin/riva-build", line 8, in <module>
sys.exit(build())
^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/servicemaker/cli/build.py", line 97, in build
pipeline_config.init_from(nm.get_config())
File "/usr/local/lib/python3.12/dist-packages/servicemaker/pipelines/asr.py", line 684, in init_from
self.last_channel_cache_size = eparams["att_context_size"][0][0]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
TypeError: 'int' object is not subscriptable
Then I created a script to set the correct att_context_size in the model using a two-dimensional array.
patched script code:
import nemo.collections.asr as nemo_asr
asr_model1 = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.restore_from(SRC_PATH)
print("SRC",asr_model1.cfg.encoder.att_context_size)
asr_model1.cfg.encoder.att_context_size = [[70, 1]]
asr_model1.save_to(DST_PATH)
loaded_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.restore_from(DST_PATH)
print("DST",loaded_model.cfg.encoder.att_context_size)
The model was successfully deployed and appears as cache_aware with a Python backend.
However, I don’t see any real-time recognition results from the microphone.
At the same time, when I send an audio file in chunks (via the gRPC protocol), the recognition results are correct.
Could you please tell me where my mistake might be?
Please provide the following information when requesting support.
Hardware - GPU (A100)
Operating System Ubuntu
Riva Version 2.19