Issues with Speaker Diarization in Riva ASR - All Audio Segments Tagged as Person 0

satish.rathod.ov · April 8, 2025, 2:00pm

I am encountering an issue while using speaker diarization with Riva ASR. When running the transcribe_file_offline.py script with diarization enabled, the audio is not being properly diarized. Instead of segmenting the audio by speaker, all segments are labeled as Person 0.

Steps Taken:

I followed the tutorial provided in the Riva ASR Speaker Diarization Guide.
I ensured that speaker diarization was enabled in the config.sh file by uncommenting the line for the rmir_diarizer_offline model.
The Riva Speech Skills server has been deployed and is running.
I installed the required Riva client library and have successfully connected to the server.

Command Used:

The command I am using to run the transcription with diarization enabled is:

bash

Copy

python3 transcribe_file_offline.py --input-file file_path --server localhost:50051 --language-code en-US --speaker-diarization --diarization-max-speakers 2

Expected Result:

The expectation is that the audio would be segmented by speaker, and each word in the transcript would be tagged with the appropriate speaker ID.

Actual Result:

Instead of properly segmenting the audio, the diarization process labels all segments as Person 0. This issue persists despite following all setup instructions and verifying the configuration.

What I Have Tried:

Double-checked the speaker diarization setup, including ensuring the diarization model is enabled.
Verified that the transcribe_file_offline.py script is executing correctly.
Tested with multiple audio files to confirm it isn’t specific to a particular file.

Questions:

Is there a specific issue with how the speaker diarization feature is being initialized or configured?
Could this be related to an issue with the model or an unsupported format?
Are there any additional debugging steps I should follow to resolve the issue?

amargolin · April 9, 2025, 2:01pm

Have you checked the following documentation:

The Riva documentation related to SD

Tutorial Example

Which Riva version are you using and which model are you trying?

satish.rathod.ov · April 9, 2025, 2:16pm

Riva Version : riva_quickstart_arm64_v2.19
Docker Version : 28.0.4.5 LTS
OS : ubuntu 22.04
Model : jetson orin nano developer kit

sophwats · April 10, 2025, 11:47am

Hey @satish.rathod.ov it looks like you’re using the older, offline diarization model. Is it possible for you to use the latest release streaming diarizer (details here ASR Overview — NVIDIA Riva) or do you have a specific offline usecase?

system · September 9, 2025, 9:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issues with Speaker Diarization in Riva ASR - when using model Whisper, Conformer-CTC Riva riva , generative_ai	2	107	July 9, 2025
Riva v2.19 speaker diarization issue Riva riva	3	123	April 24, 2025
Fail riva-client offline when try to enable diarization Riva	7	1179	July 25, 2024
Diarization - Titanet / ecapa_tdnn / VAD - roadmap Riva inception	12	1780	December 6, 2022
Is it possible to use "diar_msdd_telephonic" Speaker Diarization model in Riva? Riva	0	249	April 21, 2024
ASRService.Recognize diarization returning failure Riva	1	197	July 1, 2024
Riva on Whisper Large v3 returns only part transcription Riva	7	130	January 23, 2025
Riva ASR issue on transcribing demo audio Riva riva	3	652	April 25, 2023
Canary 1b producing 'x's as transcription on Arabic audio Riva	5	51	January 23, 2025
RIVA Conformer ASR Arabic does not provide diacritics Riva	4	74	January 23, 2025