Identify Speakers in Meetings, Calls, and Voice Apps in Real-Time with NVIDIA Streaming Sortformer

Originally published at: Identify Speakers in Meetings, Calls, and Voice Apps in Real-Time with NVIDIA Streaming Sortformer | NVIDIA Technical Blog

In every meeting, call, crowded room, or voice-enabled app, technology has a core question: who is speaking, and when? For decades, answering that question in real-time transcription was almost impossible without specialized equipment or offline batch processing.  NVIDIA Streaming Sortformer, an open, production-grade diarization model, changes what’s possible. It’s designed for low latency in realistic,…

So how can i setup this streaming sortfomer in realtime. Like while going through the hugginface model card of this model everywhere i see they are passing the audio chunks. So what do you mean by real-time.Why am i not getting the result as i think.

nvidia/diar_streaming_sortformer_4spk-v2.1 · Hugging Face (I am trying to use this model real-time by capturing live from microphone to diarize between speakers. HOw to do it.All the documentation everything is done with passing of the whole audio as .wav file please hell me with this.)