We are trying to build a application similar to MISTY. Is there a way to programmatically use the Audio2Face plugin without the Omniverse kit?
I want to use the Audio2Face in our python codebase; The text to speech input will be provided to the Audio2Face plugin which in return should output us the sample 3D model with lip sync.
Also is there any repo which contains the end to end setup for MISTY application.
Yes, with TTS you can stream the output to A2F audio player and with the right setup you can drive A2F this way. For misty we had a custom Jarvis client API to do this for us. That part is not hard to build. With more interest in this, we will consider including this in future release so is easier for anyone with any audio stream or input to drive A2F directly.
Right now there are no exact repo for Misty. We do have a standalone app for Misty that we build using these technologies. But that is not available for download at the moment.
Hey @siyuen, we’re also interested in this! We’re looking at generating lip sync for a character in Unreal Engine, with the TTS coming from an external source. We’d like to take that TTS and generate (and play) the resulting animation in realtime on the character as soon after the TTS is received as possible.
What do you think the flow for this would look like in Unreal Engine? Would it be something like this:
Unreal receives TTS from external source (e.g. Jarvis)
Unreal somehow uploads it to Omniverse?
Unreal somehow instructs Audio2Face to generate animation (USD) from the uploaded TTS
Unreal somehow downloads the generated USD from Omniverse?
Unreal plays the animation
Is this the right kind of flow? Can Audio2Face somehow run natively in Unreal or outside of the Omniverse application?
You also mentioned about a custom Jarvis client to stream the output to Audio2Face. I can’t really see where the hooks would lie for this, is this something I can currently do by tinkering with Omniverse Kit? Or is there an SDK somewhere I’m missing?
I think the flow would be like this: TTS > Audio2Face > UE
We currently don’t have an API to take TTS audio stream to Audio2Face as input (what we did on Misty) but it is not hard to do actually with existing Python audio libraries. The more people asking for this, the more we will consider releasing something like this so is easier for people to connect other audio input to Audio2Face.
The A2F > UE part can first try the Omniverse UE connector. It should be able to get the results out to UE live.
We are planning to stream the TTS audio stream to Audio2Face using the existing Python audio libraries as mentioned in the above comment.
Please can anyone guide us with the steps or the python library to create the application like MISTY?
Hi @BCSAudio2Face, we are looking at this internally right now and it most likely will need some updates on Audio2Face to make this easier for users to integrate TTS / streaming audio to directly drive Audio2Face. The more inquiry we get about this the more it can help us prioritize this feature.
We are looking into adding this support officially like the Jarvis Client Library API I mention above so stay tuned for updates.
Is there any way for us to build a standalone app similar to the Misty application with the existing resource?
If there is a way can you suggest us the steps to proceed with the development?
We are trying to build application similar to Misty, please suggest us with some solution to start the development.
Building A2F-based applications outside A2F Omniverse Kit App is not supported at the moment.
Connecting external audio source (say, TTS, as we did for Misty) within existing A2F App is technically possible but will require intensive python scripting on user’s side, since this thing is also not supported out of the box for now. If you would like to do that, you can refer to A2F App source code (within installation directory) → omni.audio2face.player extension, and check how does AudioPlayer work.
We are working on creation of the handy API for [2] to let users stream their own audio data from external sources. Stay tuned for updates.
No update yet. It is something we are working on. We will definitely let people know when we get closer to release dates. But we are working on it and we like to make it easy to empower power to do exactly just that with Audio2Face. Stay Tuned.
create a scene that has a graph of three nodes: AudioPlayer, Audio2Face and a mesh to drive (e.g. default Mark)
make sure connections are set: AudioPlayer time → Audio2Face → points → mesh
create a custom extension with your TTS service, where for example a button click callback looks up graph nodes and sets the audio stream on both player and a2f node and start playback.
Here’s a sample snippet, where audio_buffer is your audio buffer (here assumed to be int16 and 48khz sample rate), and player_instance is AudioPlayer node instance and a2f_instance is Audio2Face node instace:
We have been looking for the most realistic lip-sync solution for quite some time.
Audio2Face is perfect. However it does not work along side are current stack, so please let us know once you guys have any type of beta SDK or RESTful API for Audio2Face also Unity is a huge part of our development stack. Are there any plans to support Unity with Audio2Face. Currently we are using speech blend however the results are far from perfect. I can see Audio2Face would solve this problem for us.
Once you have some sort of SDK or API we would be happy to contribute towards the development.
Please let us know once you guys have something we can test, happy to collaborate with you on this.
Thanks.
Definitely looking forward to this. Being able to connect Audio2face with Epic Games’ Unreal Engine to generate lip sync on the fly like it happens in the live mode would be beyond amazing.
Hi thanks to the new audio player streaming I was able to send sounds to audio2face using GRPC protocol (client.py) and generate animations. But now I’m stuck receiving the output from audio2face. Can you guide me how to recieve back the animation output from audi2face using GRPC ??
I was wondering if there’s any update on an Audio2Face API? I’m using audio recordings from voice actors and converting them to animations so that I can import them into UE and apply them to Metahumans. I was wondering if there was an API available for use so that I could automate the process of creating the animations using a python script so that I could take in name.wav and export anim_name.wav files.
Is there another process that you can suggest I use to automate this?
Hey Aliannea,
We are working on implementing a Rest API and headless mode to assist in batch processing audio. We are targeting this feature for our next release, which we are hoping to have out before the end of the year.