Problem with the new RIVA Parakeet-CTC-XXL-1.1B ASR Multilingual

Hardware - GPU RTX 4090
Operating System: Linux 22.04.5
Riva Version: 2.18.0

I wanted to know if it would be possible to force the model to work exclusively in a single language. I’m currently facing an issue where, during streaming transcription for Brazilian Portuguese (pt-BR), the model mixes Russian, English, and Brazilian Portuguese. The model’s performance has been terrible.

Transcription example:
“Посторечо пицца, porfvoр бризл tá começando a falar russo, e não volta”

Unfortunately, this support is not available, but we are working on this. Will be available in future releases.

@fernandovidal8878901 Can you share which of the following use-cases are relevant:

  1. batch processing / streaming
  2. Audio is in multiple languages → output transcription to a language which is different than the audio languages.
  3. Audio is in single language → output transcription in the a language that was spoken. Using multilingual model to be ready for multiple languages in different sessions.
  4. Audio is in multiple languages → transcribe (no translation) of each language based on what’s spoken.
  5. is there a need for a LID meta-data ?
  6. Languages known in advance (forced) vs auto-detect audio langs.
  7. Other use-cases of interest…

Thank you for the feedback! I’m excited about it.

For me, the following use-cases are relevant:

  1. batch processing / streaming
  2. Audio is in single language → output transcription in the a language that was spoken. Using multilingual model to be ready for multiple languages in different sessions.
  3. Languages known in advance (forced) vs auto-detect audio langs.

Furthermore, I am very interested in transcription models for Brazilian Portuguese, mainly in real time.

1 Like