I am encountering an issue while using speaker diarization with Riva ASR. When running the transcribe_file_offline.py
script with diarization enabled, the audio is not being properly diarized. Instead of segmenting the audio by speaker, all segments are labeled as Person 0.
Steps Taken:
- I followed the tutorial provided in the Riva ASR Speaker Diarization Guide.
- I ensured that speaker diarization was enabled in the config.sh file by uncommenting the line for the
rmir_diarizer_offline
model. - The Riva Speech Skills server has been deployed and is running.
- I installed the required Riva client library and have successfully connected to the server.
Command Used:
The command I am using to run the transcription with diarization enabled is:
bash
Copy
python3 transcribe_file_offline.py --input-file file_path --server localhost:50051 --language-code en-US --speaker-diarization --diarization-max-speakers 2
Expected Result:
The expectation is that the audio would be segmented by speaker, and each word in the transcript would be tagged with the appropriate speaker ID.
Actual Result:
Instead of properly segmenting the audio, the diarization process labels all segments as Person 0. This issue persists despite following all setup instructions and verifying the configuration.
What I Have Tried:
- Double-checked the speaker diarization setup, including ensuring the diarization model is enabled.
- Verified that the
transcribe_file_offline.py
script is executing correctly. - Tested with multiple audio files to confirm it isn’t specific to a particular file.
Questions:
- Is there a specific issue with how the speaker diarization feature is being initialized or configured?
- Could this be related to an issue with the model or an unsupported format?
- Are there any additional debugging steps I should follow to resolve the issue?