Build a simple avatar with ASR, Sentence-transformer, Semantic Similarity Search, TTS and Omniverse Audio2Face

renton.hsu.vfx · May 4, 2022, 6:02am

Build a simple avatar with ASR, Sentence-transformer, Similarity Search, TTS and Omniverse Audio2Face

Project Description

I used several Python packages and NVIDIA’s Omniverse Audio2Face to quickly implement an avatar that can answer questions defined in a knowledge set or FAQ.

Github Repo

Demo

How It Works

Automatic Speech Recognition, ASR

Upon receiving user’s request, the SpeechRecognition API records the frequencies and sound waves from user’s voice and translates them into text.

Language Understanding

Sentence-Transformer is for state-of-the-art sentence, text and image embeddings that can encode input questions into feature vectors. The feature vectors represent entire sentences and their semantic information, this helps the machine in understanding the context, intention, and other nuances in the entire text.

We’ll conduct a similarity search, comparing a user input question to a list of FAQs and return the most likely answers by Facebook’s Similarity Search API.

Text To Speech

The avatar’s voice is fully synthesized by the Gtts API, which turns text into natural-sounding speech. The synthesized voice is also used to drive the avatar’s facial animation.

Omniverse Audio2Face

Omniverse Audio2Face is an application brings our avatars to life. With Omniverse Audio2Face, anyone can now create realistic facial expressions and emotions to match any voice-over track. The technology feeds the audio input into a pre-trained Deep Neural Network, based on NVIDIA and the output of the network drives the facial animation of 3D characters in real-time.

System Requirements

Element	Minimum Specifications
OS Supported	Windows 10 64-bit (Version 1909 and above)
CPU	Intel I7, AMD Ryzen 2.5GHz or greater
CPU Cores	4 or higher
RAM	16 GB or higher
Storage	500 Gb SSD or higher
GPU	Any RTX GPU
VRAM	6 GB or higher
Min. Video Driver Version	See latest drivers here

How to Install and Run the Project

Before you begin, you’ll need to clone the repository with the template code used in this repo. Open your Terminal app and find a directory where you’d like to store the code. Run this command to clone the GitHub App template repository:

$ git clone https://github.com/metaiintw/build-an-avatar-with-ASR-TTS-Transformer-Omniverse-Audio2Face.git

Creating an environment from an environment. yml file

Make sure Anaconda is installed on your local machine. Use the following command to install packages included in requirements.yml:

$ conda env create -f /path/to/environment.yml

Download and Install Omniverse Launcher

NVIDIA Omniverse is a development platform for 3D simulation and design collaboration, it is free for individual, you can download Omniverse Launcher here.

I also recommend you to watch this video tutorial, which guides you through the installation process.

1864×967 217 KB
Omniverse Launcher

Install Omniverse Audio2Face

1921×1187 163 KB
Omniverse apps

Once you got Omniverse Launcher installed, you can immediate access to all the apps, including Omniverse Audio2Face. Next, simply install Omniverse Audio2Face and you’re good to go.

1920×1186 165 KB
Omniverse Audio2Face

Omniverse Audio2Face setup

To get our Python program interacts with Omniverse Audio2Face, you should use streaming audio player that allows developers to stream audio data from an external source or applications via the gRPC protocol.

2555×1423 294 KB
streaming audio player allows developers to stream audio data from an external source

This tutorial showcases how to create an audio player and connect it to the audio2face instance using the omnigraph editor.

Bring Your Avatar to life

Now we’re ready to bring our avatar to life, simply enter the following commands into your terminal.

$ cd path_to_the_project_folder
$ conda activate avatar
$ jupyter lab

Execute the .ipynb notebook file named 1.Creating_a_simple_avatar.ipynb, start building your first avatar!

2552×1415 419 KB
1.Creating_a_simple_avatar.ipynb

Creators

Renton Hsu

WendyGram · May 23, 2022, 3:55pm

Wow @renton.hsu.vfx! Thank you for sharing this!

renton.hsu.vfx · May 27, 2022, 3:00am

Thanks NVIDIA for turning innovative technology into easy-to-use tools, my highest regards to your development team!

siyuen · May 27, 2022, 6:23am

@renton.hsu.vfx , Great Tutorial on this high demand topic! Much appreciate it and keep them coming. Nicely written.

If you have other ideas, feel free to message me.

Topic		Replies	Views
Build an Interactive Avatar with ASR, ChatGPT, TTS with Audio2Face (From renton.hsu.vfx) Audio2Face (closed)	1	1319	August 2, 2023
Build an avatar with ASR, ChatGPT, TTS and Omniverse Audio2Face Digital Humans (closed)	6	3244	February 19, 2024
AI Chatbot General Topics and Other SDKs	0	454	February 1, 2022
Real Time audio to face Audio2Face (closed) python	0	185	June 26, 2024
Audio2Face 2021.2 Open Beta Released Audio2Face (closed) audio2face	1	680	April 21, 2021
Use audio2face for avatar chatbot Digital Humans (closed)	10	3224	July 24, 2023
Audio2Face Custom blendshape live link with metahuman Tutorials & Guides audio2face	2	90	June 27, 2025
Audio2Face SAAS Audio2Face (closed)	29	5782	November 27, 2023
Audio2Face Export support for reatime audio and javis generated audio Audio2Face (closed)	4	816	May 10, 2021
Python API/SDK to inject custom parameters. in omniverse audio2face Audio2Face (closed)	12	3448	April 29, 2022