Hardware - GPU T4
Operating System - Ubuntu 20.04
Riva Version - 2.7.0
Hi, I try to infer on the pre-train Riva ASR model and the output is as follows
$ riva_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav --word_time_offsets=True
timestamps:
Word Start (ms) End (ms) Confidence
What 840 880 -1.5961e+00
is 1160 1200 -6.1294e-01
Natural 1800 2080 -2.5625e+00
Language 2200 2520 -5.9124e-01
Processing? 2720 3200 3.6569e-01
My question is what does confidence represent? And what is the range of this confidence? I found the score it not 0 to 1 range
Thanks
1 Like
Hi @mehadi.hasan
Thanks for your interest in Riva,
I have checked with the Riva team regarding the details on the requested ASR confidence score questions, will get back shortly once I have the answers
Thanks
Hi @mehadi.hasan
I have the updates from the team,
Please find the link below
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-advanced-details.html?highlight=confidence
It depends on the Decoder used
Decoder |
Word Confidence |
Utterance Confidence |
Greedy |
Minimum log probabilty accross the span of accoutic frames which represent the word, excluding blank tokens. |
Mean word confidence |
OpenSeq2Seq (os2s) |
Scores are accumulated via a prefix beam search for CTC with an LM. Word scores are simply the accumulation from the frames accociated with that word. |
Scores are accumulated as above for the entire utterance. |
Flashlight |
roughly a simple sum of log AM probabilities plus LM scores for the frames of a the word |
roughly a simple sum of log AM probabilties plus LM scores for the whole utterance |
Kaldi |
log probabilty of the word given by the associated arc in the lattice. |
log probabilty of the utternace given by the associated path through the lattice. |
1 Like
Hi, @rvinobha Thanks for the answer. It was really helpful
system
Closed
5
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.