Diarization - Titanet / ecapa_tdnn / VAD - roadmap

Has there been any updates wrt the roadmap of Diarization? Riva marketing material has stated for a while that it will be supported, so I have my fingers crossed that it will be out. Thought I’d ask here since Titanet is out and made it to NeMo.

I’ve been testing the titanet_large model and using Riva’s ASR timestamps to perform diarization on ASR output from Riva. But since diarization isn’t end to end trainable, it seems tedious to get it set up and running performantly. I considered popping the model into the triton model repo used by Riva, but not sure if that’s recommended or even feasible.

I’d appreciate any direction or insight on how this could be architected — right now I’m looking at running a separate Triton instance with multiple model replicas sharing a GPU and having VAD + diarization run via the python backend to squeeze out some performance. Of course it would be great if it was baked into Riva itself, but until then :)

On a related note - the 128 token limit on the Punctuation Model makes it extra difficult to use the time stamps provided with the transcripts for performing voice embedding since we do not have any unique id to tie a word hypothesis to its timestamp. I’m manually adding in the cut-short unpunctuated segments of speech from the word timestamps to the end of each request’s final transcript whenever the timestamps word list and final transcript’s word list lengths do not match.

Thanks for all the work on Riva!

Hey, any word on this?

Hey, any updates on these fronts?

Hi @ShantanuNair

Thanks for your interest in Riva

My Apologies for the delay

Speaker Diarization remains a high priority feature and is on the Riva roadmap.
We will share more details as we finalize a release date.

For questions on using the NeMo Speaker Diarization model, please file an issue in the NeMo discussion forum here: Discussions · NVIDIA/NeMo · GitHub

1 Like

@rvinobha @rleary Any updates at all on the diarization capabilities of Riva??
Looks like work is already been done to get diarization into the proto spec
https://github.com/nvidia-riva/common/pull/6

1 Like

Anyone have any updates on this? This was addressed in the recent keynote but still isn’t available. What better channel do I have to communicate with Nvidia teams, when I, as an active member on the forum and elsewhere, just don’t get any responses or feel heard. If there’s a better way to get updates on this, I’m all ears :) #inception

Hi @ShantanuNair

My Sincere Apologies

I guess some work/progress has been made for offline diarization,

I have reached out to the team for detail, will let you know once I have an update,

Apologies for the delay

Thanks

Hi @ShantanuNair

Apologies for the delay,

It seems it has been fixed/available from our end

You can check and let us know if any issues

Thanks

@rvinobha Which Release? I do not see it in the docs anywhere, or in the release notes Release Notes — NVIDIA Riva

How do I access it?

Thanks!

Hi @ShantanuNair

My Sincere Apologies,

Sorry for confusion,

the issue that has been fixed is about the token limit on the Punctuation Model

The Diarization is still under active development and not yet been released

Thanks

@rvinobha @rleary #inception
Has there been any update on a release for Diarization? Is a beta available for use yet?

HI @ShantanuNair

Apologies it has been a long time i have checked on this,
I will check with the team regarding the Diarization and provide an update

Thanks

Hi @ShantanuNair

I have some updates on this,

I have initial updates that this is almost ready for release, we can expect the release some time this month tentatively or next month

Thanks