[Bug] v1.10 Offline Transcripts - punctuation model breaks pipeline output

Unless the punctuation model is a currently a known limitation (I went through 1.10 release notes), I believe this is unexpected behavior.

Hardware - GPU (T4 AWS EC2)
Riva Version v1.10b
How to reproduce the issue ?

  1. In my case - I start with building a riva citrinet offline pipeline like so
riva-build speech_recognition \
   "citrinet-1024-true-offline.rmir:tlt_encode" "citrinet-1024-Jarvis-asrset-3_0-encrypted.riva:tlt_encode" \
   --offline --nn.trt_max_workspace_size=14000000000 \
   --name=citrinet-1024-english-asr-offline \
   --ms_per_timestep=80 \
   --featurizer.use_utterance_norm_params=False \
   --featurizer.precalc_norm_time_steps=0 \
   --featurizer.precalc_norm_params=False \
   --chunk_size=2700 \
   --left_padding_size=0. \
   --right_padding_size=0. \
   --decoder_type=flashlight \
   --flashlight_decoder.asr_model_delay=-1 \
   --decoding_language_model_binary=riva_asr_train_datasets_3gram.binary \
   --decoding_vocab=flashlight_decoder_vocab.txt \
   --flashlight_decoder.lm_weight=0.2 \
   --flashlight_decoder.word_insertion_score=0.2 \
   --flashlight_decoder.beam_threshold=20. \
   --language_code=en-US
  1. Run riva_asr_client --audio_file=wav/10minutes.wav -output_filename=out.txt in riva-client image

  2. See out.txt

Expected output: Complete transcript
Observed output:

Run time: 5.4236 sec.
Total audio processed: 2358.6 sec.
Throughput: 434.88 RTFX
Final transcripts written to out.txt
root@ip-172-31-7-237:/work/examples# cat out.txt
{"audio_filepath": "/work/examples/wav/craig-full-16k.wav","text": "Believe it's recording now. Okay? sorry, back to sharing the desktop. Okay, so I'd like to just get to know a little bit about yourself. Like what is it that you do or what do you focus on? What are you passionate about? It doesn't have to be a long answer. Anything that you're comfortable sharing? Yeah, sure thing, and I appreciate you asking. So Mylo? we're about a year old. I was working with one co founder who had the idea What Mylo does is we allow you to create and share our processes seamlessly across the Internet, right, So what we're seeking to do is replace cases and "}

Clearly all the audio is processed: Total audio processed: 2358.6 sec.
but after punctuation, the pipeline outputs transcripts cut very short.

This practically breaks our use case for Riva/offline ASR. Any workarounds to get a usable offline recognition pipeline for longer audio would be appreciated!

Do let me know if any more details would be appreciated.

Would appreciate any updates. Still facing this issue.

Hi @ShantanuNair ,

Thanks for your interest in Riva,

I will check regarding your issue/concern further with the team and will provide an update to you soon

If possible can we get the audio file used to check from our end

Hi @ShantanuNair

Thanks for your interest in Riva,

I have an update regarding your Issue,

The team is working on the request (after punctuation, the pipeline outputs transcripts cut very short), It will be fixed in one of the future upcoming releases, We will keep you updated

1 Like

Hello…
we are very curious if this has been addressed in the 2.10 release?
The release notes from 1.9 to 2.10 do not make any mention of this being addressed.
thanks so much for any insights…
benn

@sofo-benn As far as I know, this issue has not been resolved.

Hi @sofo-benn and @ShantanuNair

Thanks for your interest in Riva,

Positively, This issue will be fixed in the next upcoming release that will be releasing this month end

Thanks @rvinobha. Do keep us updated on the status of this, looking forward to it :)