Problem with the new RIVA Parakeet-CTC-XXL-1.1B ASR Multilingual

fernandovidal8878901 · December 16, 2024, 7:15pm

Hardware - GPU RTX 4090
Operating System: Linux 22.04.5
Riva Version: 2.18.0

I wanted to know if it would be possible to force the model to work exclusively in a single language. I’m currently facing an issue where, during streaming transcription for Brazilian Portuguese (pt-BR), the model mixes Russian, English, and Brazilian Portuguese. The model’s performance has been terrible.

Transcription example:
“Посторечо пицца, porfvoр бризл tá começando a falar russo, e não volta”

mayjain · January 24, 2025, 5:18am

Unfortunately, this support is not available, but we are working on this. Will be available in future releases.

amargolin · January 24, 2025, 9:01pm

@fernandovidal8878901 Can you share which of the following use-cases are relevant:

batch processing / streaming
Audio is in multiple languages → output transcription to a language which is different than the audio languages.
Audio is in single language → output transcription in the a language that was spoken. Using multilingual model to be ready for multiple languages in different sessions.
Audio is in multiple languages → transcribe (no translation) of each language based on what’s spoken.
is there a need for a LID meta-data ?
Languages known in advance (forced) vs auto-detect audio langs.
Other use-cases of interest…

fernandovidal8878901 · January 25, 2025, 2:22am

Thank you for the feedback! I’m excited about it.

fernandovidal8878901 · January 25, 2025, 2:26am

For me, the following use-cases are relevant:

batch processing / streaming
Audio is in single language → output transcription in the a language that was spoken. Using multilingual model to be ready for multiple languages in different sessions.
Languages known in advance (forced) vs auto-detect audio langs.

Furthermore, I am very interested in transcription models for Brazilian Portuguese, mainly in real time.

Topic		Replies	Views
Canary 1b producing 'x's as transcription on Arabic audio Riva	5	177	January 23, 2025
No streaming/live transcription feature for Whisper on the Riva? Riva	5	243	January 27, 2025
Deploying NVIDIA Riva Multilingual ASR with Whisper and Canary Architectures While Selectively Deactivating NMT Technical Blog	0	143	February 20, 2025
Final transcripts showing empty transcription Riva python	5	727	November 2, 2022
How to apply the and training model to other languages? Riva	0	586	January 21, 2022
Riva ASR not returning the final transcipt accurately Riva python	1	188	June 9, 2025
RIVA NMT Translation Riva	0	422	October 11, 2023
Help with custom deploy and perform inference using citrinet-mandarin NGC pre-trained model in Riva Riva riva	5	1293	August 27, 2021
Riva model for FR-EN transcription Riva audio	2	866	January 2, 2022
Riva ASR issue on transcribing demo audio Riva riva	3	718	April 25, 2023

Problem with the new RIVA Parakeet-CTC-XXL-1.1B ASR Multilingual

Related topics