Audio2Face SAAS

Hello Everyone,

We are trying to build a application similar to MISTY. Is there a way to programmatically use the Audio2Face plugin without the Omniverse kit?
I want to use the Audio2Face in our python codebase; The text to speech input will be provided to the Audio2Face plugin which in return should output us the sample 3D model with lip sync.

Also is there any repo which contains the end to end setup for MISTY application.

Thanks.

1 Like

Hi, thank you for the interest.

Yes, with TTS you can stream the output to A2F audio player and with the right setup you can drive A2F this way. For misty we had a custom Jarvis client API to do this for us. That part is not hard to build. With more interest in this, we will consider including this in future release so is easier for anyone with any audio stream or input to drive A2F directly.

Right now there are no exact repo for Misty. We do have a standalone app for Misty that we build using these technologies. But that is not available for download at the moment.

Thanks

Hey @siyuen, we’re also interested in this! We’re looking at generating lip sync for a character in Unreal Engine, with the TTS coming from an external source. We’d like to take that TTS and generate (and play) the resulting animation in realtime on the character as soon after the TTS is received as possible.

What do you think the flow for this would look like in Unreal Engine? Would it be something like this:

  1. Unreal receives TTS from external source (e.g. Jarvis)
  2. Unreal somehow uploads it to Omniverse?
  3. Unreal somehow instructs Audio2Face to generate animation (USD) from the uploaded TTS
  4. Unreal somehow downloads the generated USD from Omniverse?
  5. Unreal plays the animation

Is this the right kind of flow? Can Audio2Face somehow run natively in Unreal or outside of the Omniverse application?

You also mentioned about a custom Jarvis client to stream the output to Audio2Face. I can’t really see where the hooks would lie for this, is this something I can currently do by tinkering with Omniverse Kit? Or is there an SDK somewhere I’m missing?

Thanks! :)

1 Like

Hi @charisma-ben ,

I think it could be something like this:

TTS = (ie: Jarvis or some audio signal)

I think the flow would be like this:
TTS > Audio2Face > UE

We currently don’t have an API to take TTS audio stream to Audio2Face as input (what we did on Misty) but it is not hard to do actually with existing Python audio libraries. The more people asking for this, the more we will consider releasing something like this so is easier for people to connect other audio input to Audio2Face.

The A2F > UE part can first try the Omniverse UE connector. It should be able to get the results out to UE live.

2 Likes

Hi Team,

We are planning to stream the TTS audio stream to Audio2Face using the existing Python audio libraries as mentioned in the above comment.
Please can anyone guide us with the steps or the python library to create the application like MISTY?

1 Like

Hi @BCSAudio2Face, we are looking at this internally right now and it most likely will need some updates on Audio2Face to make this easier for users to integrate TTS / streaming audio to directly drive Audio2Face. The more inquiry we get about this the more it can help us prioritize this feature.

We are looking into adding this support officially like the Jarvis Client Library API I mention above so stay tuned for updates.

@dkorobchenko for vis

1 Like

Hi Team,

Is there any way for us to build a standalone app similar to the Misty application with the existing resource?
If there is a way can you suggest us the steps to proceed with the development?

We are trying to build application similar to Misty, please suggest us with some solution to start the development.

Hi @BCSAudio2Face

  1. Building A2F-based applications outside A2F Omniverse Kit App is not supported at the moment.
  2. Connecting external audio source (say, TTS, as we did for Misty) within existing A2F App is technically possible but will require intensive python scripting on user’s side, since this thing is also not supported out of the box for now. If you would like to do that, you can refer to A2F App source code (within installation directory) → omni.audio2face.player extension, and check how does AudioPlayer work.
  3. We are working on creation of the handy API for [2] to let users stream their own audio data from external sources. Stay tuned for updates.
4 Likes

Hi,
is there any update available on releasing the API that would allow connecting external audio so that a video bot like Misty could be developed?

No update yet. It is something we are working on. We will definitely let people know when we get closer to release dates. But we are working on it and we like to make it easy to empower power to do exactly just that with Audio2Face. Stay Tuned.

@BCSAudio2Face

Connecting TTS is not so difficult.

  • create a scene that has a graph of three nodes: AudioPlayer, Audio2Face and a mesh to drive (e.g. default Mark)
  • make sure connections are set: AudioPlayer time → Audio2Face → points → mesh
  • create a custom extension with your TTS service, where for example a button click callback looks up graph nodes and sets the audio stream on both player and a2f node and start playback.

Here’s a sample snippet, where audio_buffer is your audio buffer (here assumed to be int16 and 48khz sample rate), and player_instance is AudioPlayer node instance and a2f_instance is Audio2Face node instace:

Blockquote
audio_data = np.frombuffer(result.audio_data, dtype=np.int16)
audio_data = (1.0 / 32768.0) * audio_data.astype(np.float32)
track = omni.audio2face.core.a2f.audio.AudioTrack(audio_data, 48000)
player_instance.set_track2(track)
a2f_instance.set_a2f_track(player_instance.get_player().track)
player_instance.play_audio_player()

When using mesh transfer, the graph is more complicated, but the same mesh is being driven, so the core concept does not change.

Note that 48khz will be resampled, Audio2Face expects 16Khz as far as I can tell.

1 Like

I changed stream from microphone to any URL. Can you instruct me, how to animate the face in web as Misty was made?

We have been looking for the most realistic lip-sync solution for quite some time.
Audio2Face is perfect. However it does not work along side are current stack, so please let us know once you guys have any type of beta SDK or RESTful API for Audio2Face also Unity is a huge part of our development stack. Are there any plans to support Unity with Audio2Face. Currently we are using speech blend however the results are far from perfect. I can see Audio2Face would solve this problem for us.

Once you have some sort of SDK or API we would be happy to contribute towards the development.
Please let us know once you guys have something we can test, happy to collaborate with you on this.
Thanks.

1 Like

Definitely looking forward to this. Being able to connect Audio2face with Epic Games’ Unreal Engine to generate lip sync on the fly like it happens in the live mode would be beyond amazing.

1 Like

Hi thanks to the new audio player streaming I was able to send sounds to audio2face using GRPC protocol (client.py) and generate animations. But now I’m stuck receiving the output from audio2face. Can you guide me how to recieve back the animation output from audi2face using GRPC ??

1 Like

Hello,

I was wondering if there’s any update on an Audio2Face API? I’m using audio recordings from voice actors and converting them to animations so that I can import them into UE and apply them to Metahumans. I was wondering if there was an API available for use so that I could automate the process of creating the animations using a python script so that I could take in name.wav and export anim_name.wav files.

Is there another process that you can suggest I use to automate this?

Thank you.

Hey Aliannea,
We are working on implementing a Rest API and headless mode to assist in batch processing audio. We are targeting this feature for our next release, which we are hoping to have out before the end of the year.

Sounds great. Thank you, Will!

Does “batch processing” mean the new release won’t be able to do “live streaming”?

Our goal (and it seems like many on this thread) is:

  • Generate an audio stream
  • Send the audio stream to a headless Audio2Face
  • Audio2Face “live” processes audio stream
  • Screen capture rendering to make a video stream
  • Stream back the generated video

Or would this not be online processing and instead 3D data generation after the fact? (Regarding your next planned release)

Would you be willing to share your code for how to stream to Audio2face?