Speech_to_text_citrinet infer yields random transcription results

Subject1 · April 12, 2022, 2:10pm

Speech_to_text_citrinet infer yields random transcription results - The problem is described here, but it is not solved.

Any recognized file produces text like “individual case return sc case transform them transform return sc return sc return case return case sc case sc does sc individual return case return still scie still transformie transform return case w”. I tried AN4 dataset recognition, but it didn’t help either, the recognized text was about the same. The only thing is that I downloaded the dataset from another source and converted from sph format to wav 16khz using audacity.

I also tried the Russian model, the recognized text is always different from what is pronounced in the audio file.

Morganh · April 12, 2022, 3:16pm

Could you please refer to Tao speech_to_text evaluate+infer show very weak results - #26 by Morganh and run some experiments?
In that topic, I was running with speech_to_text. The result is fine.

For your case, when run speech-to-text-citrinet, you can use Speech to Text English Citrinet | NVIDIA NGC

Subject1 · April 13, 2022, 2:35pm

But I don’t need to run evaluate. I just want to check recognition quality using infer. I ran the command

tao speech_to_text_citrinet infer -e /specs/speech_to_text_citrinet/infer.yaml -g 1 -k tlt_encode -m /results/citrinet/speechtotext_english_citrinet.tlt -r /results/citrinet/infer file_paths=[/data/an268-mbmg-b.wav ]

using the checkpoint you sent me, the result was “university was university one university one” which is completely different from what is pronounced in the file.

I ran all the commands as per the notepad via console, all folders are mounted for tao docker.

Subject1 · April 13, 2022, 4:47pm

I just completed all the steps to prepare AH4 already using the official nvidia notebook, Speech to Text Citrinet Notebook | NVIDIA NGC the result of recognizing files in the notepad using the “ASR Inference” cell is identical results of file recognition in the console - a set of random words. All I did was download the model and follow the instructions on the site. I think that there may be some mistake on your part, perhaps you updated something recently.

Morganh · April 14, 2022, 3:48am

Hi,
There might be something wrong in that version of ngc pretrained model.
Please use below instead.

wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/speechtotext_english_citrinet/versions/trainable_**v1.7**/files/speechtotext_english_citrinet_1024.tlt

I run inference against it. Previous issue is gone.
And also run evaluation, the WER is only about 2.4579%

speech_to_text_citrinet evaluate -e specs/speech_to_text_citrinet/evaluate.yaml -k tlt_encode -m **speechtotext_english_citrinet_1024.tlt** -r evalution_speech_to_text_citrinet_ngc_tlt test_ds.manifest_filepath=data/an4_converted/test_manifest.json

DATALOADER:0 TEST RESULTS
{'test_loss': 0.520318329334259, 'test_wer': 0.02457956038415432}

speech_to_text_citrinet infer -e specs/speech_to_text/infer.yaml -k tlt_encode -m speechtotext_english_citrinet_1024.tlt -r infer_result file_paths=[data/an4_converted/wavs/an406-fcaw-b.wav]

[NeMo I 2022-04-14 03:41:32 infer:72] Predicted transcript: rabout g m e f three nine

Subject1 · April 14, 2022, 10:00am

Indeed, using this checkpoint I was able to get great results on different audio files even outside the AH4 dataset. But why are the checkpoints for the Russian and English versions available at RIVA Citrinet ASR Russian | NVIDIA NGC and RIVA Citrinet ASR English | NVIDIA NGC not recognized correctly? I also want to check the quality of models for other languages.

Morganh · April 14, 2022, 10:10am

Still checking. Not sure if there is something mismatching.
Could you try to run speech-to-text instead of speech-to-text-citrinet for these two models you mentioned?

Subject1 · April 14, 2022, 10:34am

For both models, when running speech_to_text instead of speech_to_text_citrinet I get the error:
FileNotFoundError: [Errno 2] No such file or directory: ‘/tmp/tmpjatcuk3h/model_weights.ckpt’

Morganh · April 14, 2022, 10:37am

Oh, OK, please ignore my request. These models should only run with speech_to_text_citrinet.

Subject1 · April 14, 2022, 1:53pm

Should I do something else? Or should I wait for your answer?

Morganh · April 14, 2022, 4:05pm

The internal team is involved to check. There is no result yet for those two models. As mentioned above, for English version, please use
$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/speechtotext_english_citrinet/versions/trainable_ v1.7 /files/speechtotext_english_citrinet_1024.tlt

Subject1 · April 21, 2022, 3:14pm

Hi, Morganh. Is there any news for these two models?

Morganh · April 22, 2022, 12:12am

Yes, the issue has been addressed. Internally team is working on new ones.

Morganh · May 2, 2022, 4:46pm

New ones are available in RIVA Citrinet ASR English | NVIDIA NGC
RIVA Citrinet ASR Russian | NVIDIA NGC

yingliu · July 6, 2022, 6:19am

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

system · July 20, 2022, 6:20am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Speech_to_text_citrinet infer yields random transcription results TAO Toolkit	13	1150	April 14, 2022
Tao speech_to_text evaluate+infer show very weak results TAO Toolkit	26	2081	March 8, 2022
Wrong outputs from our fine-tuned version of speechtotext_english_citrinet_1024.tlt after deploying using riva_init.sh Riva inception	3	783	August 12, 2022
Error finetuning with new catalog RIVA Citrinet ASR English model - "Archive doesn't have the required runtime, format, version or object class type" Riva	1	698	April 22, 2022
Speech_to_text infer: model_weights.ckpt not found Riva	0	675	February 23, 2022
Text Classification infer fails TAO Toolkit	14	1368	October 12, 2021
[TAO] use trt of tao on tensorrt , process infer happened repeated calls TAO Toolkit tensorrt , tao	5	715	November 7, 2022
Trouble using trt-infer on peoplenet pretrained model TAO Toolkit	11	799	October 12, 2021
Tao inference ValueError: could not broadcast input array from shape (4700160) into shape (1566720) TAO Toolkit	9	650	December 9, 2022
CSPDarkent 53 tensorrt model is not working correcly TAO Toolkit	24	738	May 24, 2023

Speech_to_text_citrinet infer yields random transcription results

Related topics