Riva v2.19 speaker diarization issue

playwithai · April 1, 2025, 5:19pm

Please provide the following information when requesting support.

Hardware - GPU (H100 HGX & L40S)
Hardware - CPU (AMD Genoa)
Operating System - Ubuntu 22.04 LTS
Riva Version - both v2.16 and v2.19
TLT Version (if relevant) N/A
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

In short, my code (based on the examples from nvidia-riva/python-clients repo) works well with ASR/NMT/TTS via Riva v2.16 container from NGC. But the speaker diarization, modeled after the code at How do I Use Speaker Diarization with Riva ASR? — NVIDIA Riva, runs into the following issues.

the riva v2.16 container is pulled from NGC. I then tried to launch the v2.19 container via riva_qauickstart package which worked fine. But v2.19 docker container has the identical issue.

File “/home/user/miniconda3/envs/riva/lib/python3.9/site-packages/grpc/_channel.py”, line 1006, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Error: Unavailable diarizer model requested given these parameters: pipeline_type=diarizer; type=offline; "
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Error: Unavailable diarizer model requested given these parameters: pipeline_type=diarizer; type=offline; “, grpc_status:3, created_time:“2025-04-01T10:06:44.808750416-07:00”}”

on Riva v2.19 docker container:

I0401 17:06:44.767925 518 grpc_riva_asr.cc:685] ASRService.Recognize called.
I0401 17:06:44.768723 130334 grpc_riva_asr.cc:854] Using model parakeet-1.1b-en-US-asr-offline-asr-bls-ensemble from Triton localhost:8001 for inference
I0401 17:06:44.768980 130335 grpc_riva_asr.cc:905] ASRService.Recognize diarization called.
I0401 17:06:44.769042 130335 riva_asr_stream.cc:226] Detected format: encoding = 1 RAW numchannels = 1 samplerate = 16000 bitspersample = 16
E0401 17:06:44.769136 130335 grpc_riva_asr.cc:957] Error: Unavailable diarizer model requested given these parameters: pipeline_type=diarizer; type=offline;
I0401 17:06:44.800679 518 stats_builder.h:100] {“specversion”:“1.0”,“type”:“riva.asr.recognize.v1”,“source”:“”,“subject”:“”,“id”:“52b126d6-0e8f-4efc-8048-c40c01ae8d09”,“datacontenttype”:“application/json”,“time”:“2025-04-01T17:06:44.767895393+00:00”,“data”:{“release_version”:“2.19.0”,“customer_uuid”:“”,“ngc_org”:“”,“ngc_team”:“”,“ngc_org_team”:“”,“container_uuid”:“”,“language_code”:“en-US”,“request_count”:1,“audio_duration”:0.0,“speech_duration”:0.0,“status”:3,“err_msg”:"Error: Unavailable diarizer model requested given these parameters: pipeline_type=diarizer; type=offline; "}}

My code (works well before adding the speaker diarization line).

On my Flask web app, I use webRTC JS lib to capture voice and convert to wav format then send to Riva gRPC api by chunks. The chunk are valid wav files.

class ASRManager:
#def init(self, riva_host=“172.30.1.76:51051”, sample_rate=16000, chunk_size=1600):
def init(self, riva_host=“172.30.1.79:50051”, sample_rate=16000, chunk_size=1600):
self.sample_rate = sample_rate
self.chunk_size = chunk_size
self.auth = Auth()
self.auth.channel = grpc.insecure_channel(riva_host)
self.asr_service = ASRService(self.auth)

    self.recognition_config = RecognitionConfig(
        language_code="en-US",
        max_alternatives=1,
        profanity_filter=False,
        enable_automatic_punctuation=True,
        encoding=AudioEncoding.LINEAR_PCM,
        sample_rate_hertz=sample_rate
    )
    add_speaker_diarization_to_config(self.recognition_config, diarization_enable=True, diarization_max_speakers=3)

def offline_recognize_chunk(self, audio: AudioSegment) -> str:
    samples = audio.set_channels(1).set_frame_rate(16000).get_array_of_samples()
    byte_content = samples.tobytes()
    response = self.asr_service.offline_recognize(byte_content, self.recognition_config)
    if response.results and response.results[0].alternatives:
        return response.results[0].alternatives[0].transcript
    return ""

playwithai · April 1, 2025, 6:41pm

okay so the NV team pointed me to a link that I have looked:

First I modified the config.sh to enable diarization support then run config.sh and start the riva v2.19 container.

asr_acoustic_model=(“parakeet_1.1b”)
asr_accessory_model=(“diarizer”)

Then I need to use streaming_response_generator() to get speaker diarization working.

def transcribe_stream(self, audio_generator, callback):
    config = StreamingRecognitionConfig(
        config=RecognitionConfig(
            language_code="en-US",
            encoding=AudioEncoding.LINEAR_PCM,
            sample_rate_hertz=44100,
            max_alternatives=1,
            enable_automatic_punctuation=True,
            enable_word_time_offsets=True,
        ),
        interim_results=False,
        # single_utterance=False
    )

    # ✅ Add speaker diarization support
    add_speaker_diarization_to_config(
        config.config, 
        diarization_enable=True,
        diarization_max_speakers=10)

    responses = self.asr_service.streaming_response_generator(
        audio_chunks=audio_generator,
        streaming_config=config
    )
    print("#####responses")
    print(responses)
    for response in responses:
        print("#####response")
        print(response)
        if response.results:
            words = response.results[0].alternatives[0].words
            transcript_with_speakers = " ".join(
                [f"[S{w.speaker_tag}] {w.word}" for w in words]
            )
            callback(transcript_with_speakers, is_final=True)

Now I need to figure out how to improve speaker identification. The out of the box performance is not getting speakers identified correctly:

[S0] I [S0] could [S0] just [S0] get [S0] it.

[S0] Poor

[S0] Thank [S0] you.

[S0] Okay.

[S0] You [S0] need [S0] to [S0] go

[S0] Do [S0] it

[S0] Never

[S0] One

[S0] Okay.

[S0] You [S0] know.

[S0] Yeah.

[S0] I [S0] think [S0] the

[S0] Can [S0] you [S0] call [S0] in?

sophwats · April 10, 2025, 11:54am

Hi @playwithai were you able to get improved speaker identification? If not are you able to share the input audio so we can triage and check the results on our side? Thanks

system · April 24, 2025, 11:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issues with Speaker Diarization in Riva ASR - when using model Whisper, Conformer-CTC Riva riva , generative_ai	1	32	July 9, 2025
Fail riva-client offline when try to enable diarization Riva	7	1129	July 25, 2024
Issues with Speaker Diarization in Riva ASR - All Audio Segments Tagged as Person 0 Riva riva , generative_ai	3	72	April 10, 2025
Diarization - Titanet / ecapa_tdnn / VAD - roadmap Riva inception	12	1732	December 6, 2022
Using speaker diarization in Riva, VAD model error? Riva	0	145	May 21, 2024
AttributeError: 'RivaDiarizerPipelineConfig' object has no attribute 'sample_rate' Riva	1	213	November 25, 2024
ASRService.Recognize diarization returning failure Riva	1	168	July 1, 2024
Is it possible to use "diar_msdd_telephonic" Speaker Diarization model in Riva? Riva	0	218	April 21, 2024
Riva tts model not getting readdy Riva riva	1	519	March 28, 2023
Error calling rtts.SynthesizeSpeechRequest and riva_asr.Recognize Riva	1	572	May 12, 2022

Riva v2.19 speaker diarization issue

Related topics