Hardware - GPU (A2)
Riva Version: 2.19.0
I am deploying Whisper, Conformers-CTC, and Speaker Diarization models using the Riva SDK. However, I am not receiving any output from the diarizer. I am using riva-quickstart-2.19.0 with the following deployment configurations:
- Deploy Whisper model:
riva-build speech_recognition /data/rmir/whisper_large_v3.rmir:tlt_encode \
/data/riva_model/whisper_large_v3.riva:tlt_encode \
--offline \
--name=whisper-large-v3-turbo-multi-asr-offline \
--return_separate_utterances=True \
--unified_acoustic_model \
--chunk_size 30 \
--left_padding_size 0 \
--right_padding_size 0 \
--decoder_type trtllm \
--feature_extractor_type torch \
--torch_feature_type whisper \
--featurizer.norm_per_feature false \
--max_batch_size 8 \
--featurizer.precalc_norm_params False \
--featurizer.max_batch_size=8 \
--featurizer.max_execution_batch_size=8 \
--language_code=en,zh,de,es,ru,ko,fr,ja,pt,tr,pl,ca,nl,ar,sv,it,id,hi,fi,vi,he,uk,el,ms,cs,ro,da,hu,ta,no,th,ur,hr,bg,lt,la,mi,ml,cy,sk,te,fa,lv,bn,sr,az,sl,kn,et,mk,br,eu,is,hy,ne,mn,bs,kk,sq,sw,gl,mr,pa,si,km,sn,yo,so,af,oc,ka,be,tg,sd,gu,am,yi,lo,uz,fo,ht,ps,tk,nn,mt,sa,lb,my,bo,tl,mg,as,tt,haw,ln,ha,ba,jw,su,yue,multi
- Deploy Diarizer model:
riva-build diarizer \
/data/rmir/diarizer.rmir:tlt_encode \
/data/riva_model/vad_multilingual_marblenet_v1.10.0.riva:tlt_encode \
/data/riva_model/titanet_small_v1.0.0.riva:tlt_encode \
--vad_type=neural \
--diarizer_backend.offline \
--diarizer_backend.optimization_graph_level=-1 \
--embedding_extractor_nn.max_batch_size=32 \
--embedding_extractor_nn.use_onnx_runtime \
--embedding_extractor_nn.optimization_graph_level=-1 \
--clustering_backend.max_batch_size=0 \
--chunk_size=300 \
--audio_sec_limit=4001 \
--diarizer_backend.language_code=generic
- Inference code:
import io
import wave
import grpc
import riva.client
def load_audio(path):
with wave.open(path, 'rb') as wf:
sample_rate = wf.getframerate()
channels = wf.getnchannels()
frames = wf.getnframes()
audio = wf.readframes(frames)
return audio, sample_rate, channels
def main():
auth = riva.client.Auth(uri='localhost:8005')
riva_asr = riva.client.ASRService(auth)
path = "audio.wav"
content, sr, channels = load_audio(path)
with open(path, 'rb') as f:
content = f.read()
config = riva.client.RecognitionConfig(
encoding=riva.client.AudioEncoding.LINEAR_PCM,
sample_rate_hertz=sr,
audio_channel_count=channels,
language_code="multi",
max_alternatives=1,
enable_automatic_punctuation=False,
enable_word_time_offsets=True,
)
riva.client.add_custom_configuration_to_config(config, 'enable_vad_endpointing:true')
riva.client.add_speaker_diarization_to_config(
config,
diarization_enable=True,
diarization_max_speakers=2
)
try:
response = riva_asr.offline_recognize(content, config)
except grpc.RpcError as e:
print("RPC Error:", e.details())
return
print("\nASR Transcript with Speaker Diarization: ", response)
for result in response.results:
for word in result.alternatives[0].words:
print(f"[SPK{word.speaker_tag}] {word.word}", end=' ')
print()
if __name__ == "__main__":
main()
- Despite enabling speaker diarization, I do not receive any speaker tags in the output. Docker logs show the following:
I0704 07:15:03.842489 556 grpc_riva_asr.cc:685] ASRService.Recognize called.
I0704 07:15:03.843019 2081 grpc_riva_asr.cc:905] ASRService.Recognize diarization called.
I0704 07:15:03.843521 2081 riva_asr_stream.cc:226] Detected format: encoding = 1 numchannels = 1 samplerate = 22050 bitspersample = 16
W0704 07:15:03.843725 2081 grpc_riva_asr.cc:1055] Could not get parameter append_space_to_transcripts from model riva-diarizer. A space will be added after utterances by default.
I0704 07:15:03.843734 2081 grpc_riva_asr.cc:1060] Using model riva-diarizer from Triton localhost:8001 for diarization inference
I0704 07:15:03.843801 2080 grpc_riva_asr.cc:854] Using model whisper-large-v3-turbo-multi-asr-offline-asr-bls-ensemble from Triton localhost:8001 for inference
I0704 07:15:03.892876 2083 grpc_riva_asr.cc:1143] Creating resampler, audio file sample rate=22050 model sample_rate=16000
I0704 07:15:06.496675 2081 grpc_riva_asr.cc:1102] ASRService.Recognize diarization returning OK
I0704 07:15:06.497761 556 stats_builder.h:100] {"specversion":"1.0","type":"riva.asr.recognize.v1","source":"","subject":"","id":"58009794-6bea-480b-ae21-b21ad8213fc4","datacontenttype":"application/json","time":"2025-07-04T07:15:03.842467682+00:00","data":{"release_version":"2.19.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"multi","request_count":1,"audio_duration":30.0,"speech_duration":0.0,"status":0,"err_msg":""}}