Guidance on Integrating NVIDIA Riva Models with RAG Chatbot for Real-Time Voice Interactions

Hello,
I am currently working on building an AI Voice Assistant that supports live-stream voice conversations. For this project, I have successfully deployed NVIDIA Riva models—ASR (parakeet-ctc-1.1b-asr)** and **TTS (fastpitch-hifigan-tts) on an AWS EC2 instance using Docker and developed a Retrieval-Augmented Generation (RAG) chatbot.

To complete the application, my goal is to integrate the deployed NVIDIA Riva models with the RAG chatbot to enable real-time conversational interactions. Specifically, I aim to handle live audio input seamlessly and generate dynamic responses for users in real time.

While I have made significant progress, I would appreciate your guidance on establishing the connection between the NVIDIA Riva models and the RAG chatbot. Could you recommend a workflow or share any relevant resources that could assist in implementing this integration?

Thank you for your time and support.