Marblenet vad for real time streaming applications

arielrado · January 29, 2024, 12:41pm

Hello,

I have been trying to use nvidia’s marblenet for voice activity detection for real time audio and have run into some trouble.

following the notebook from nemo’s github, specifically the part talking about online microphone inference. When testing with some of my data I get inconsistent results. The probabilities of speech and non speech are very close to each other, reaching a verdict by a very thin margin (around 0.01), icreasing the threshold to anything above 0.5 results in constant non-speech labels.

Any insights are welcome!

Topic		Replies	Views
Realtime streaming ASR with Nemo Deep Learning (Training & Inference) nemo	0	853	August 5, 2020
Voice Demo Container for Jetson Xavier NX not working Jetson Xavier NX audio	11	1961	October 18, 2021
Nvidia Nemo cuts last part of transcription Deep Learning (Training & Inference)	0	491	June 23, 2020
GPU-Accelerated Speech to Text with Kaldi: A Tutorial on Getting Started Technical Blog	7	905	March 6, 2021
rtsp inference Jetson Nano	10	3069	October 18, 2021
Speech recognition TensorRT	1	436	May 3, 2021
Getting a Real Time Factor Over 60 for Text-To-Speech Services Using NVIDIA Jarvis Technical Blog	0	445	August 25, 2020
How to implement speech recognition on jetson nano Jetson Nano	4	4774	October 14, 2021
Generate Natural Sounding Speech from Text in Real-Time Technical Blog	4	531	April 27, 2020
NLP inference on Jetson Deep Learning (Training & Inference)	0	634	November 27, 2018

Marblenet vad for real time streaming applications

Related topics