Riva python client don't showing per word confidence score

mehadi.hasan · December 7, 2022, 3:30am

Hardware - GPU T4
Operating System - Ubuntu 20.04
Riva Version - 2.7.0
Nvidia Riva Python Client Version - 0.0.5

When I’m using the Riva command line tool

$ riva_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav  --word_time_offsets=True
timestamps: 
Word                                    Start (ms)      End (ms)        Confidence      

What                                    840             880             -1.5961e+00     
is                                      1160            1200            -6.1294e-01     
Natural                                 1800            2080            -2.5625e+00     
Language                                2200            2520            -5.9124e-01     
Processing?                             2720            3200            3.6569e-01

it’s giving me a world level of confidence score

But when I’m using python API it doesn’t give a confidence score

results {
  alternatives {
    transcript: "what is natural language Processing "
    confidence: -0.999424934387207
    words {
      start_time: 840
      end_time: 880
      word: "what"
    }
    words {
      start_time: 1160
      end_time: 1200
      word: "is"
    }
    words {
      start_time: 1800
      end_time: 2080
      word: "natural"
    }
    words {
      start_time: 2200
      end_time: 2520
      word: "language"
    }
    words {
      start_time: 2720
      end_time: 3200
      word: "Processing"
    }
  }
  channel_tag: 1
  audio_processed: 4.800000190734863
}

Final transcript: what is natural language Processing

Is there anything that I’m missing? Or word-level confidence score feature is not available in python API?

Here is the code that I use.
Riva speech docker image nvcr.io/nvidia/riva/riva-speech 2.7.0

import wave

import riva.client
import riva.client.proto.riva_asr_pb2 as rasr
import riva.client.proto.riva_asr_pb2_grpc as rasr_srv
import riva.client.proto.riva_audio_pb2 as ra

# init riva recognition stub
riva_server = "0.0.0.0:50051"
auth = riva.client.Auth(uri=riva_server)
stub = rasr_srv.RivaSpeechRecognitionStub(auth.channel)

# init riva recognition configuration
config = rasr.RecognitionConfig(
    encoding=ra.AudioEncoding.LINEAR_PCM,
    sample_rate_hertz=16000,
    language_code="en-US",
    max_alternatives=1,
    enable_automatic_punctuation=False,
    audio_channel_count=1,
    enable_word_time_offsets=True,
)

with wave.open(audio_file, "rb") as fp:
    wav_data = fp.readframes(-1)
    request = rasr.RecognizeRequest(config=config, audio=wav_data)
    response = stub.Recognize(request)
    if len(response.results) > 0 and len(response.results[0].alternatives) > 0:
       outputs = response.results[0].alternatives[0]
       print(outputs)

rvinobha · December 13, 2022, 7:33am

HI @mehadi.hasan

Thanks for your interest in Riva,

Thanks for sharing the Riva Client Version, 0.0.5 does not support confidence scores

The latest riva-client supports confidence, current latest is 2.8.0

Please upgrade the python package

pip install --upgrade nvidia-riva-client

and you will be able to see confidence in your python transcript output

Let me know if you face any issues

Regards,
Roshandev

system · December 27, 2022, 7:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.