Riva ASR not returning the final transcipt accurately

I am using Nvidia Jetson Orin Developer Kit and I want to use the Riva ASR for doing STT for en-us language. I have referred this document Speech Recognition — NVIDIA Riva to download the riva_quickstart_arm64:2.19.0. Below is the default model configuration from the config.sh with which I have run the riva_init.sh followed by riva_start.sh which executed successfully.

riva_target_gpu_family=“tegra”

riva_tegra_platform=“orin”

service_enabled_asr=true
service_enabled_nlp=true

asr_acoustic_model=(“conformer”)

asr_language_code=(“en-US”)

asr_accessory_model=(“”)

Followed by that I have downloaded and installed the python-client sdk from here GitHub - nvidia-riva/python-clients: Riva Python client API and CLI utils. Post that I am using the transcribe_mic.py from here python-clients/scripts/asr at main · nvidia-riva/python-clients · GitHub for ASR. I am getting the english transcript but it is not accurate.

Below is my observation -

  1. While I speak the intermediate transcript shows the correct text in the terminal but the final text output misses out on few words in between
  2. There is a flag called verbatim_transcripts in the config which if set to true is supposed to return the transcript text as is without any inference but doesn’t seem to be working as expected and inference is still applied and in between words are missing in the final output

What are the settings wherein I need to configure and fine tune so that I am able to get the exact transcript as spoken in English without missing out any word in between for real time transcript as the conversation is happening?

Hi @bikramjeet.nath ,
This is likely a known issue and is already on roadmap to be fixed.
The fix will be vailablein upcoming future release. Please stay tuned with the release notes.
However you can help us understand the criticality of this request.

Thanks