Tutorial: ASR (RIVA) + TTS (RIVA) + LLM (NIMs) + Audio2Face + Unreal Engine (Quickly Build Your Avatar)

Recently, I created a virtual assistant using NVIDIA technologies. During this process, I encountered some challenges, such as difficulty finding RIVA documentation and issues with Unreal Engine integration. Therefore, I decided to share this project to help others quickly build their virtual assistants. Each step includes detailed instructions written by me, along with relevant official documentation.

You can find the project on GitHub: LLMAvatarTalk-An-Interactive-AI-Assistant

Demo video:
YouTube Demo

Architecture:

Features:

  • Speech Recognition: Converts user speech into text in real-time using NVIDIA RIVA ASR technology.
  • Language Processing: Leverages advanced LLM (such as llama3-70b-instruct) via NVIDIA NIM APIs for deep semantic understanding and response generation.
  • Text-to-Speech: Transforms generated text responses into natural-sounding speech using NVIDIA RIVA TTS.
  • Facial Animation: Generates realistic facial expressions and animations based on audio output using Audio2Face technology.
  • Unreal Engine Integration: Enhances virtual character expressiveness by real-time linking Audio2Face with Unreal Engine’s Metahuman.
  • LangChain Integration: Simplifies the integration of NVIDIA RIVA and NVIDIA NIM APIs, providing a seamless and efficient workflow for AI development.

Prerequisites

I hope these resources will help you quickly get started and create your own virtual assistant. If you encounter any issues during the process, feel free to submit an issue on the GitHub page or ask questions in the forum :)

Thank you, please see the Virtual Beings Facebook group for more on this topic.