Rebuilding the asrset3 citrinet offline pipeline but with larger chunk size

ShantanuNair · December 28, 2021, 10:36am

g4dn.xlarge T4 16GB
Riva v1.8b

Follow up to: Offline/Batch broken on 1.8b due to 900s limit - #3 by rleary

Hi @rleary, thank you so much, and I really appreciate the discussion.

Regarding reproducing the model - I initially tried to reproduce the rmir_asr_citrinet_1024_asrset3p0_offline model included with riva 1.7b to be able to use the previous streaming-offline mode. However as I understand it, the offline recognition batching process of Riva has been updated, and simply using the older streaming-offline ensembles won’t give me the previous offline functionality. When trying to use the previous pipeline it just gives me an error stating that the maximum input audio can only be 15 seconds, and does not act as the previous offline pipeline with which I can transcribe large files with a Recognize call.

Now my aim is to rebuild the v1.8 citrinet-offline pipeline but with a chunk size of 7200.

I followed the steps from the docs to reproduce, but have a few queries regarding some artifacts.

The build command I used:

riva-build speech_recognition \
   Citrinet-1024-true-offline.rmir:tlt_encode Citrinet-1024-Jarvis-ASRSet-3_0-encrypted.riva:tlt_encode \
   --offline \
   --name=citrinet-1024-english-asr-true-offline \
   --ms_per_timestep=80 \
   --featurizer.use_utterance_norm_params=False \
   --featurizer.precalc_norm_time_steps=0 \
   --featurizer.precalc_norm_params=False \
   --chunk_size=7200 \
   --left_padding_size=0. \
   --right_padding_size=0. \
   --decoder_type=flashlight \
   --flashlight_decoder.asr_model_delay=-1 \
   --decoding_language_model_binary=jarvis_asr_train_datasets_noSpgi_noLS_gt_3gram.binary \
   --decoding_vocab=lexicon.txt \
   --flashlight_decoder.lm_weight=0.2 \
   --flashlight_decoder.word_insertion_score=0.2 \
   --flashlight_decoder.beam_threshold=20. \
   --language_code=en-US

My queries are regarding the decoding_vocab and decoding_language_model_binary params. What should they be set to to recreate the prebuilt rmirs?
Here, I ended up pulling them from/data/models/citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/ lexicon.txt and jarvis_asr_train_datasets_noSpgi_noLS_gt_3gram.binary. Are these available for easy pulling from NGC to recreate the prebuilt rmirs?

For the .riva file, I pulled it from the tao model card Speech to Text English Citrinet | NVIDIA NGC

I chose the deployable_v3.0 version .riva file: Citrinet-1024-Jarvis-ASRSet-3_0-encrypted.riva Is there a Nemo pretrained model I can export to get this .riva?

Thanks for your help with this build, I simply want to be able to reproduce the given rmirs and be able to build off of the pretrained models/ensembles.

NVES_R · December 31, 2021, 9:02am

Hi @ShantanuNair ,

Thanks for reaching out. There may be a delay in response time with many folks on holiday this week.

@rleary to help follow up

joshuar7 · January 17, 2022, 7:20pm

I’d also like to know the answer to this question. I wrote my own question here Recreate QuickStart Stock Citrinet Model with Modified Parameters , and it’s very similar to this ticket.

ShantanuNair · January 20, 2022, 5:01pm

@NVES_R @rleary I’m eagerly waiting for an update on this, so I can recreate the performance of the offline mode, especially after the changes to the offline API introduced in v1.8.

Apologies if my pinging seems rather insistent :) I’m following development of the TAO/Riva/Nemo suite very closely and am trying to take full advantage of what it has to offer in production.

pcastonguay · January 20, 2022, 6:12pm

Hi @ShantanuNair, thanks for you interest in Riva. The procedure that you described to reproduce the pre-built offline Citrinet .rmir is correct. We don’t have the KenLM binary and lexicon files easily available on NGC. We will try to address this in our next release of Riva and publish them to NGC. But you can grab them from the Triton model repository, like you did.

We don’t publish a .nemo for the pre-trained Citrinet model used in Riva currently but you can use the trainable_v3.0 tlt model and use TAO to fine-tune if needed.

ShantanuNair · February 9, 2022, 9:32am

@pcastonguay Hey, any update on ngc and rebuilding the pipelines for the 1.9b release?

pcastonguay · February 10, 2022, 6:58pm

The KenLM binary and lexicon files are ready to be published to NGC however we are facing some technical difficulties with NGC which prevents me from uploading them right now. Hopefully that should get resolved soon. I will keep you posted.

pcastonguay · February 11, 2022, 7:42pm

@ShantanuNair The KenLM binary and lexicon files used to generate the ASR pipelines in the quickstart scripts have been uploaded to NGC at: Riva ASR English(en-US) LM | NVIDIA NGC

Lexicon file is: flashlight_decoder_vocab.txt
Language model binary: riva_asr_train_datasets_3gram.binary

ShantanuNair · February 15, 2022, 8:52am

Hi @pcastonguay Really appreciate the uploads to NGC, however I’m still not clear how to reproduce the pretrained models. When trying to reproduce using steps from the docs, it only resulted in blank transcripts.

Which .riva file do I use?

pcastonguay · February 16, 2022, 5:55pm

You should use the .riva from: RIVA Citrinet ASR English | NVIDIA NGC

I just tried the riva-build command you shared, with the .riva above, and the lexicon and language model binaries shared earlier and was able to transcribe a 1h40 minutes audio file. This was with Riva 1.8.0-beta.

system · March 31, 2022, 8:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.