Deployments of custom-trained ASR models result in empty transcript results

user53516 · November 17, 2021, 11:52pm

Hello,

I am trying to train and deploy a custom ASR model with Riva. I have been able to train and evaluate Citrinet models with NeMo, but I had trouble deploying them and decided to see if I could have better results following the linked tutorial notebooks’ steps closely:

I can get through the steps in the training notebook reasonably well, but once I try to actually deploy a Quartznet 15x5 model that I custom-trained, I find that I get empty results to transcription requests sent to the server. Output from an offline request:

results {
  channel_tag: 1
}

Sometimes, I get an “audio_processed” field included in the results, but the duration shown is crazy small (this is for a ~45 second file, note the e-41 at the end):

results {
  channel_tag: 1
  audio_processed: 4.5830867574207467e-41
}

For streaming queries, I just get nothing back.

The Nvidia-provided models work fine, both when I launch them by setting the appropriately parts of config.sh and running the quickstart scripts. Likewise, I was able to successfully download a .tlt for a Quartznet model from the Nvidia Catalog, export that to a .riva, build that to a .rmir, and use riva-deploy.

Given the export, build, and deploy steps I followed were successful in deploying the pretrained Quartznet model, I imagine the problem is in my training process. To train, I have been pretty much just following the steps and commands outlined in the linked Catalog notebook.

I also tried following the fine-tuning step to tune an Nvidia-provided .tlt with custom data, and the resulting model gave the same issue as the models I trained from scratch.

For context, the custom models I’ve trained in that notebook with the sample data all produce nothing (or just a single character) in the inference step and have very high loss and WER. I imagine that the tutorial parameters and sample data would have been selected so as to produce a model that at least pulls out a word or two, so something seems to be going wrong.

Does anyone have any advice? I can provide whatever additional information is needed.

Hardware: AWS g4dn.xlarge instance, with a T4 GPU
Operating System: Ubuntu 20.04 LTS via NVIDIA GPU Cloud image
Riva Version - 1.7
TLT Version (if relevant) - 3.21.08

peterpan · November 18, 2021, 12:39pm

I’ve run through training with the example scripts provided in the NeMo package and tried launching with the build commands as per the documentation (not the notebooks) and have had the same issues with the output being a similar object with only “channel_tag” and “audio_processed” sometimes showing up as well but only with a small value similar to the example above.

Tried several different versions, different model training processes, different environments, different command line arguments e.g. with and without --offline but still haven’t been able to get results from the deploy process unless it was from a model I directly downloaded from NVIDIA. Would really appreciate figuring this one out so we can use the models we’ve trained up!

Topic		Replies	Views
Wrong outputs from our fine-tuned version of speechtotext_english_citrinet_1024.tlt after deploying using riva_init.sh Riva inception	3	786	August 12, 2022
Final transcripts showing empty transcription Riva python	6	568	November 2, 2022
Finetuned ASR conformer returns only empty transcripts Riva	13	986	October 20, 2022
Help with custom deploy and perform inference using citrinet-mandarin NGC pre-trained model in Riva Riva riva	6	1134	October 12, 2021
Nemo Trained model not giving transcript when deployed on jarvis both offline and streaming Riva nemo , riva	6	1017	September 8, 2021
Speech-to-text-deployment notebook Riva	1	739	December 22, 2021
Riva returns emtpy string for Nemo Quartznet model that finetuend on Persian Riva	1	344	December 6, 2022
[TLT3.0][Jarvis] Fine tuning Quartznet produces garbled transcript Riva riva	7	979	October 12, 2021
Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva: No Transcriptions Returned Riva	0	69	December 1, 2024
Final transcript is empty on streaming mode Riva	5	659	December 22, 2022

Deployments of custom-trained ASR models result in empty transcript results

Related topics