Hardware - GPU T4
Operating System - Ubuntu 20.04
Riva Version - 2.7.0
Nvidia Riva Python Client Version - 0.0.5
When I’m using the Riva command line tool
$ riva_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav --word_time_offsets=True
timestamps:
Word Start (ms) End (ms) Confidence
What 840 880 -1.5961e+00
is 1160 1200 -6.1294e-01
Natural 1800 2080 -2.5625e+00
Language 2200 2520 -5.9124e-01
Processing? 2720 3200 3.6569e-01
it’s giving me a world level of confidence score
But when I’m using python API it doesn’t give a confidence score
results {
alternatives {
transcript: "what is natural language Processing "
confidence: -0.999424934387207
words {
start_time: 840
end_time: 880
word: "what"
}
words {
start_time: 1160
end_time: 1200
word: "is"
}
words {
start_time: 1800
end_time: 2080
word: "natural"
}
words {
start_time: 2200
end_time: 2520
word: "language"
}
words {
start_time: 2720
end_time: 3200
word: "Processing"
}
}
channel_tag: 1
audio_processed: 4.800000190734863
}
Final transcript: what is natural language Processing
Is there anything that I’m missing? Or word-level confidence score feature is not available in python API?
Here is the code that I use.
Riva speech docker image nvcr.io/nvidia/riva/riva-speech 2.7.0
import wave
import riva.client
import riva.client.proto.riva_asr_pb2 as rasr
import riva.client.proto.riva_asr_pb2_grpc as rasr_srv
import riva.client.proto.riva_audio_pb2 as ra
# init riva recognition stub
riva_server = "0.0.0.0:50051"
auth = riva.client.Auth(uri=riva_server)
stub = rasr_srv.RivaSpeechRecognitionStub(auth.channel)
# init riva recognition configuration
config = rasr.RecognitionConfig(
encoding=ra.AudioEncoding.LINEAR_PCM,
sample_rate_hertz=16000,
language_code="en-US",
max_alternatives=1,
enable_automatic_punctuation=False,
audio_channel_count=1,
enable_word_time_offsets=True,
)
with wave.open(audio_file, "rb") as fp:
wav_data = fp.readframes(-1)
request = rasr.RecognizeRequest(config=config, audio=wav_data)
response = stub.Recognize(request)
if len(response.results) > 0 and len(response.results[0].alternatives) > 0:
outputs = response.results[0].alternatives[0]
print(outputs)