Dynamic Scale Weighting Through Multiscale Speaker Diarization

jwitsoe September 16, 2022, 9:38pm 1

Originally published at: https://developer.nvidia.com/blog/dynamic-scale-weighting-through-multiscale-speaker-diarization/

MSDD is a neural model that can be trained on 2-speaker dataset and the proposed model enables overlap-aware speaker diarization on flexible number of speakers.

Topic		Replies	Views
Real-Time Noise Suppression Using Deep Learning Technical Blog	4	789	June 7, 2019
Generative AI Research Spotlight: Demystifying Diffusion-Based Models Technical Blog	0	325	December 14, 2023
LLM 기술 마스터하기: 훈련 Technical Blog - South Korea	0	604	November 24, 2023
Jump-start Training for Speech Recognition Models in Different Languages with NVIDIA NeMo Technical Blog	0	544	August 25, 2020
GPU-Accelerated Speech to Text with Kaldi: A Tutorial on Getting Started Technical Blog	7	836	March 6, 2021
Develop Smaller Speech Recognition Models with NVIDIA’s NeMo Framework Technical Blog	11	929	November 8, 2022
Extracting Features from Multiple Audio Channels with Kaldi Technical Blog	0	355	August 24, 2020
NVIDIA DIGITS Assists Alzheimer's Disease Prediction Technical Blog	22	482	May 14, 2018
Scaling Language Model Training to a Trillion Parameters Using Megatron Technical Blog	1	763	April 12, 2021
Mastering LLM Techniques: Inference Optimization Technical Blog	0	446	November 17, 2023