Help Needed: Riva ASR Model Not Detecting Audio

Hi everyone,

I’m working on deploying an ASR model using NVIDIA Riva and encountering an issue where the model is not detecting any audio. I’m following the steps below to build and deploy the Riva model, but when I process an audio file, it seems like no audio is being recognized. Here are the steps and commands I used:
1. Building the Riva ASR model:

bash

riva-build speech_recognition \
   /data/rmir/Parakeet_ctc_xxl.rmir \
   /data/Parakeet-CTC-XXL-1.1b_spe13k_em-ea_1.0.riva \
  --offline \
  --name=parakeet-1.1b-unified-ml-cs-em-ea-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=3.2 \
  --right_padding_size=3.2 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=em-ea

2. Deploying the Riva model:

bash

riva-deploy -f  \
        /data/rmir/Parakeet_ctc_xxl.rmir \
        /data/models/

Issue:

When calling the ASR service to process an audio file, I get the following log output, which indicates that no audio was detected:

I0725 09:14:10.128805 243 grpc_riva_asr.cc:678] ASRService.Recognize called. I0725 09:14:10.132581 243 grpc_riva_asr.cc:863] Using model parakeet-1.1b-unified-ml-cs-em-ea-asr-offline-asr-bls-ensemble from Triton localhost:8001 for inference I0725 09:14:10.233958 243 stats_builder.h:100] {"specversion":"1.0","type":"riva.asr.recognize.v1","source":"","subject":"","id":"04974b4b-76b2-4a7c-a941-ecd0498ce5d3","datacontenttype":"application/json","time":"2024-07-25T09:14:10.12875964+00:00","data":{"release_version":"2.16.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"em-ea","request_count":1,"audio_duration":0.0,"speech_duration":0.0,"status":0,"err_msg":""}}
It seems like the ASR service is being called, but it reports a duration of 0.0 seconds for both audio and speech. I’ve verified that the audio file is in the correct format and has content.
Questions:

  1. Has anyone encountered a similar issue with Riva ASR models not detecting audio?
  2. Are there any common misconfigurations or steps I might have missed in the model building or deployment process?
  3. Could there be an issue with the audio file or the way it’s being processed?

Any guidance or suggestions would be greatly appreciated!

Thanks in advance for your help.

Try to add this line in riva-build script --nn.use_trt_fp32