Problem with deploy fastconformer-rnnt asr model to nvidia-riva for streaming

I wanna try to test fastconformer en-Us model in riva and have some problems with deploy.
Steps to reproduce:

1.install nvidia-riva 2.19

2.download model GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC | NVIDIA NGC

3.extract rnnt head via convert_nemo_asr_hybrid_to_ctc.py script from nemo2riva examples

python convert_nemo_asr_hybrid_to_ctc.py --input stt_en_fastconformer_hybrid_large_streaming_80ms.nemo --output conv_stt_en_fastconformer_hybrid_large_streaming_80ms.nemo --model_type=rnnt

4.convert model to riva

nemo2riva --key tlt_encode --out conv_stt_en_fastconformer_hybrid_large_streaming_80ms.riva --format nemo conv_stt_en_fastconformer_hybrid_large_streaming_80ms.nemo

5.Then trying to deploy via riva build in riva docker container:

riva-build speech_recognition     conv_stt_en_fastconformer_hybrid_large_streaming_80ms.rmir:tlt_encode     converted_stt_en_fastconformer_hybrid_large_streaming_80ms.riva     
--name=usasr    --decoder_type=nemo     --ms_per_timestep=80     --chunk_size=0.16     --padding_size=0.08     --language_code=en-US

[TensorRT-LLM] TensorRT-LLM version: 0.17.0

Traceback (most recent call last):

  File "/usr/local/bin/riva-build", line 8, in <module>

    sys.exit(build())

^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/servicemaker/cli/build.py", line 97, in build

    pipeline_config.init_from(nm.get_config())

  File "/usr/local/lib/python3.12/dist-packages/servicemaker/pipelines/asr.py", line 684, in init_from

    self.last_channel_cache_size = eparams["att_context_size"][0][0]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^

TypeError: 'int' object is not subscriptable

Then I created a script to set the correct att_context_size in the model using a two-dimensional array.

patched script code:

import nemo.collections.asr as nemo_asr
asr_model1 = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.restore_from(SRC_PATH)
print("SRC",asr_model1.cfg.encoder.att_context_size)
asr_model1.cfg.encoder.att_context_size = [[70, 1]]
asr_model1.save_to(DST_PATH)
loaded_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.restore_from(DST_PATH)
print("DST",loaded_model.cfg.encoder.att_context_size)

The model was successfully deployed and appears as cache_aware with a Python backend.

However, I don’t see any real-time recognition results from the microphone.

At the same time, when I send an audio file in chunks (via the gRPC protocol), the recognition results are correct.

Could you please tell me where my mistake might be?

Please provide the following information when requesting support.

Hardware - GPU (A100)
Operating System Ubuntu
Riva Version 2.19

Hi Leshka,

The FastConformer RNNT model is not officially tested and supported with the RIVA SDK.

Here is our latest ASR NIM for use: Riva ASR NIM Overview — NVIDIA NIM Riva ASR

Thanks,

AHarpster

What’s the difference between the Parakeet and Fast Conformer models? Are they the same or is there a difference?