Hello,
I am trying to train and deploy a custom ASR model with Riva. I have been able to train and evaluate Citrinet models with NeMo, but I had trouble deploying them and decided to see if I could have better results following the linked tutorial notebooks’ steps closely:
I can get through the steps in the training notebook reasonably well, but once I try to actually deploy a Quartznet 15x5 model that I custom-trained, I find that I get empty results to transcription requests sent to the server. Output from an offline request:
results {
channel_tag: 1
}
Sometimes, I get an “audio_processed” field included in the results, but the duration shown is crazy small (this is for a ~45 second file, note the e-41 at the end):
results {
channel_tag: 1
audio_processed: 4.5830867574207467e-41
}
For streaming queries, I just get nothing back.
The Nvidia-provided models work fine, both when I launch them by setting the appropriately parts of config.sh and running the quickstart scripts. Likewise, I was able to successfully download a .tlt for a Quartznet model from the Nvidia Catalog, export that to a .riva, build that to a .rmir, and use riva-deploy.
Given the export, build, and deploy steps I followed were successful in deploying the pretrained Quartznet model, I imagine the problem is in my training process. To train, I have been pretty much just following the steps and commands outlined in the linked Catalog notebook.
I also tried following the fine-tuning step to tune an Nvidia-provided .tlt with custom data, and the resulting model gave the same issue as the models I trained from scratch.
For context, the custom models I’ve trained in that notebook with the sample data all produce nothing (or just a single character) in the inference step and have very high loss and WER. I imagine that the tutorial parameters and sample data would have been selected so as to produce a model that at least pulls out a word or two, so something seems to be going wrong.
Does anyone have any advice? I can provide whatever additional information is needed.
Hardware: AWS g4dn.xlarge instance, with a T4 GPU
Operating System: Ubuntu 20.04 LTS via NVIDIA GPU Cloud image
Riva Version - 1.7
TLT Version (if relevant) - 3.21.08