ASR with streaming inference and diarization

Description

Hi, I’m currently working for a B-to-B software company. And my client are requesting to make a mobile recording app that have the following functions:

1. ASR
2. Diarization
3. Streaming Capability

I have done my research into NVIDIA NeMo and it diarization as well as Streaming Capability. And I also have some specific questions. But first could I ask what are you reccomendation to go about this ? any open-source, or NVIDIA API should I considered ?

Hi @quanduy1109 ,
Can you pls raise your concern on GitHub - NVIDIA/NeMo: A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Thanks