Words joined together in transcription

Please provide the following information when requesting support.

Hardware - nvidia a10g (G5.xlarge on AWS)
Hardware - 4vcpu
Operating System- ubuntu 20.04
Riva Version: riva_quickstart:2.7.0

How to reproduce the issue ?

Any fast speaking clip with punctuation pronounced like hello world comma, please welcome. etc. It tends to join the “period” with another word. I tried changing the chunk_size 1/4 smaller than default on conformer model, or 2x larger, but it does not seem to make the situation better. Do I have to tweak both chunk_size + padding? or it’s something else.

Hi @txia

Thanks for your interest in Riva,

I will check with the internal team and provide updates


Hi @txia

Apologies for delay,

the below audio we get the following transcript

we get the following transcript → “Hello, welcome, please welcome.”

we get the following transcript → “Hello. Well, please welcome.”

Riva doesn’t support spoken punctuations (i.e. well comma please => well, please).

Also The Acoustic Model will generate normalized text. This needs to be handled at ITN phase (which is not currently supported)