Build an avatar with ASR, ChatGPT, TTS and Omniverse Audio2Face

renton.hsu.vfx · August 2, 2023, 1:05pm

Demo

Below, I present the results of my work using NVIDIA Audio2Face and ChatGPT to create a basic interactive virtual human. Users can engage with it through voice input and have interactive conversations.

Description

This is an update to my previously published article on a simple interactive conversational virtual human technology. It has been a year since I last wrote about it, and I finally have the time to release new content. Over this past year, there have been significant developments, including improvements to Audio2Face and the launch of ChatGPT. With these convenient AI tools, creating a more convincing and lifelike virtual human experience has become easier than ever before.

The source code

I have published the source code for this micro-project on my GitHub repository. Feel free to download it from: build-an-avatar-with-ASR-TTS-Transformer-Omniverse-Audio2Face/2.Avatar_With_ChatGPT at main · metaiintw/build-an-avatar-with-ASR-TTS-Transformer-Omniverse-Audio2Face · GitHub

System requirements

System Requirements

Element	The configuration used in the demo
OS Supported	Ubuntu 22.04
CPU	Intel I9, 13900
RAM	96 GB
Storage	2TB SSD
GPU	RTX 4090

How to use the source code to create the virtual assistant experience demonstrated in the demo video

1. Build virtual environment

Using this Github repo to build the avatar is straightforward. Just use Anaconda to create a Python virtual environment with avatar_requirements.yml.

2. Open the attached USD file with NVIDIA Audio2face

Open claire_audio_streaming.usd in the USD_files folder using NVIDIA Audio2Face (Version 2023.1.0).

3. Run the IPython notebook

Finally, activate the Python virtual environment and run build-an-avatar-with-ASR-TTS-ChatGPTOmniverse-Audio2Face.ipynb.

Please note that you need to have a ChatGPT account and token to use the ChatGPT API in the “build-an-avatar-with-ASR-TTS-ChatGPTOmniverse-Audio2Face.ipynb” notebook. You should input your ChatGPT token to access the API. Instructions on how to obtain the token can be found within the notebook.

Once you have completed the above steps, you can start experiencing this simple virtual human application.

I will update the documentation on the GitHub repo and this article to provide more details about the development process. I hope this content is helpful to you.

dr.l.fadaly · February 5, 2024, 4:38pm

Is RTX 3060 ,16 ram suitable

Simplychenable · February 5, 2024, 5:33pm

@dr.l.fadaly the general consensus is ‘the more VRAM you have, the better’, especially if you find yourself needing to work with A2F often. here is the spec for A2F for your reference:

https://docs.omniverse.nvidia.com/audio2face/latest/common/technical-requirements.html

grumpy_bud · February 18, 2024, 11:46pm

Should the import of the environment have taken 40 minutes + (still going)?

Simplychenable · February 19, 2024, 1:51am

@grumpy_bud what kind of “environment” are you referring to, which OV app are you using, and specification of your hardware?

since you posed under this thread under digital human category, it’s best if you could elaborate further with these sort of detail, others could be more informed of your particular scenario. better yet, i would probably encorage you to make a new thread under a more relevant forum if you are using a specific OV app

grumpy_bud · February 19, 2024, 1:55am

Looks like using the anaconda navigator was a bad idea as it was using 11 gigs of ram and actually couldn’t even access the file. Mamba is working perfectly (edit: I have 16 gigs)

grumpy_bud · February 19, 2024, 2:07pm

@Simplychenable when running the juypter notebook it says sentence_transformers not found

Topic		Replies	Views
Build an Interactive Avatar with ASR, ChatGPT, TTS with Audio2Face (From renton.hsu.vfx) Audio2Face (closed)	1	1345	August 2, 2023
Build a simple avatar with ASR, Sentence-transformer, Semantic Similarity Search, TTS and Omniverse Audio2Face Digital Humans (closed) audio2face	3	5164	May 27, 2022
Source code 開源：以 NVIDIA Audio2Face 和 ChatGPT 建立一個可問答互動的虛擬人 Taiwan riva , chinese , omniverse	2	2052	February 26, 2024
AI Chatbot General Topics & Other SDKs	0	496	February 1, 2022
Tutorial: ASR (RIVA) + TTS (RIVA) + LLM (NIMs) + Audio2Face + Unreal Engine (Quickly Build Your Avatar) Audio2Face (closed) open-source-software , demos-and-tutorials , audio2face , riva , unreal-engine	1	2107	July 1, 2024
How to use the official API to run Audio2face and display a virtual human? Omniverse Technical Support audio2face	1	367	August 21, 2024
Real Time audio to face Audio2Face (closed) python	0	204	June 26, 2024
Digital Humans Digital Humans (closed) artificialintelligence	9	1991	November 18, 2021
Photorealistic avatar using NVIDIA Jetson AGX Orin and NVIDIA Omniverse Audio2Face Jetson Projects audio2face , riva , jetson , generative_ai	1	2121	April 28, 2024
Use audio2face for avatar chatbot Digital Humans (closed)	10	3294	July 24, 2023