[Bug] v1.10 Offline Transcripts - punctuation model breaks pipeline output

ShantanuNair · March 21, 2022, 12:01am

Unless the punctuation model is a currently a known limitation (I went through 1.10 release notes), I believe this is unexpected behavior.

Hardware - GPU (T4 AWS EC2)
Riva Version v1.10b
How to reproduce the issue ?

In my case - I start with building a riva citrinet offline pipeline like so

riva-build speech_recognition \
   "citrinet-1024-true-offline.rmir:tlt_encode" "citrinet-1024-Jarvis-asrset-3_0-encrypted.riva:tlt_encode" \
   --offline --nn.trt_max_workspace_size=14000000000 \
   --name=citrinet-1024-english-asr-offline \
   --ms_per_timestep=80 \
   --featurizer.use_utterance_norm_params=False \
   --featurizer.precalc_norm_time_steps=0 \
   --featurizer.precalc_norm_params=False \
   --chunk_size=2700 \
   --left_padding_size=0. \
   --right_padding_size=0. \
   --decoder_type=flashlight \
   --flashlight_decoder.asr_model_delay=-1 \
   --decoding_language_model_binary=riva_asr_train_datasets_3gram.binary \
   --decoding_vocab=flashlight_decoder_vocab.txt \
   --flashlight_decoder.lm_weight=0.2 \
   --flashlight_decoder.word_insertion_score=0.2 \
   --flashlight_decoder.beam_threshold=20. \
   --language_code=en-US

Run riva_asr_client --audio_file=wav/10minutes.wav -output_filename=out.txt in riva-client image
See out.txt

Expected output: Complete transcript
Observed output:

Run time: 5.4236 sec.
Total audio processed: 2358.6 sec.
Throughput: 434.88 RTFX
Final transcripts written to out.txt
root@ip-172-31-7-237:/work/examples# cat out.txt
{"audio_filepath": "/work/examples/wav/craig-full-16k.wav","text": "Believe it's recording now. Okay? sorry, back to sharing the desktop. Okay, so I'd like to just get to know a little bit about yourself. Like what is it that you do or what do you focus on? What are you passionate about? It doesn't have to be a long answer. Anything that you're comfortable sharing? Yeah, sure thing, and I appreciate you asking. So Mylo? we're about a year old. I was working with one co founder who had the idea What Mylo does is we allow you to create and share our processes seamlessly across the Internet, right, So what we're seeking to do is replace cases and "}

Clearly all the audio is processed: Total audio processed: 2358.6 sec.
but after punctuation, the pipeline outputs transcripts cut very short.

This practically breaks our use case for Riva/offline ASR. Any workarounds to get a usable offline recognition pipeline for longer audio would be appreciated!

Do let me know if any more details would be appreciated.

ShantanuNair · March 31, 2022, 8:00am

Would appreciate any updates. Still facing this issue.

rvinobha · April 24, 2022, 2:34pm

Hi @ShantanuNair ,

Thanks for your interest in Riva,

I will check regarding your issue/concern further with the team and will provide an update to you soon

If possible can we get the audio file used to check from our end

rvinobha · May 5, 2022, 9:50am

Hi @ShantanuNair

Thanks for your interest in Riva,

I have an update regarding your Issue,

The team is working on the request (after punctuation, the pipeline outputs transcripts cut very short), It will be fixed in one of the future upcoming releases, We will keep you updated

sofo-benn · May 19, 2022, 9:57pm

Hello…
we are very curious if this has been addressed in the 2.10 release?
The release notes from 1.9 to 2.10 do not make any mention of this being addressed.
thanks so much for any insights…
benn

ShantanuNair · May 25, 2022, 9:01am

@sofo-benn As far as I know, this issue has not been resolved.

rvinobha · May 25, 2022, 9:33am

Hi @sofo-benn and @ShantanuNair

Thanks for your interest in Riva,

Positively, This issue will be fixed in the next upcoming release that will be releasing this month end

ShantanuNair · May 26, 2022, 10:43am

Thanks @rvinobha. Do keep us updated on the status of this, looking forward to it :)

ShantanuNair · June 10, 2022, 6:50am

@rvinobha Any updates on this?

rvinobha · June 13, 2022, 5:20pm

Hi @ShantanuNair

Thanks for your interest in Riva

The Issue is fixed and is be available in 2.2.1 version

system · September 21, 2022, 10:04am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Rebuilding the asrset3 citrinet offline pipeline but with larger chunk size Riva	10	1311	February 16, 2022
Offline/Batch broken on 1.8b due to 900s limit Riva	3	759	December 28, 2021
Wrong outputs from our fine-tuned version of speechtotext_english_citrinet_1024.tlt after deploying using riva_init.sh Riva inception	3	779	August 12, 2022
Riva v2.19 speaker diarization issue Riva riva	3	46	April 24, 2025
Nvidia RIVA - 2.6.0 gettting stuck after some time. Giving timeout error after sometime of inferencing Riva	5	729	December 19, 2022
Riva Quickstart 2.2.1 offline en-US models missing Riva	3	1048	July 4, 2022
Does canary not support live transcription/streaming? Riva	3	116	January 23, 2025
RIVA ASR StreamingRecognition low confidence for word transcripts Riva	1	488	November 29, 2023
Streaming Inference fails intermittently with error: must specify the START flag on the first request of the sequence Riva	7	1281	July 28, 2024
Final transcripts showing empty transcription Riva python	6	556	November 2, 2022

[Bug] v1.10 Offline Transcripts - punctuation model breaks pipeline output

Related topics