Audio2Face SAAS

Hello Everyone,

We are trying to build a application similar to MISTY. Is there a way to programmatically use the Audio2Face plugin without the Omniverse kit?
I want to use the Audio2Face in our python codebase; The text to speech input will be provided to the Audio2Face plugin which in return should output us the sample 3D model with lip sync.

Also is there any repo which contains the end to end setup for MISTY application.

Thanks.

Hi, thank you for the interest.

Yes, with TTS you can stream the output to A2F audio player and with the right setup you can drive A2F this way. For misty we had a custom Jarvis client API to do this for us. That part is not hard to build. With more interest in this, we will consider including this in future release so is easier for anyone with any audio stream or input to drive A2F directly.

Right now there are no exact repo for Misty. We do have a standalone app for Misty that we build using these technologies. But that is not available for download at the moment.

Thanks

Hey @siyuen, we’re also interested in this! We’re looking at generating lip sync for a character in Unreal Engine, with the TTS coming from an external source. We’d like to take that TTS and generate (and play) the resulting animation in realtime on the character as soon after the TTS is received as possible.

What do you think the flow for this would look like in Unreal Engine? Would it be something like this:

  1. Unreal receives TTS from external source (e.g. Jarvis)
  2. Unreal somehow uploads it to Omniverse?
  3. Unreal somehow instructs Audio2Face to generate animation (USD) from the uploaded TTS
  4. Unreal somehow downloads the generated USD from Omniverse?
  5. Unreal plays the animation

Is this the right kind of flow? Can Audio2Face somehow run natively in Unreal or outside of the Omniverse application?

You also mentioned about a custom Jarvis client to stream the output to Audio2Face. I can’t really see where the hooks would lie for this, is this something I can currently do by tinkering with Omniverse Kit? Or is there an SDK somewhere I’m missing?

Thanks! :)

Hi @charisma-ben ,

I think it could be something like this:

TTS = (ie: Jarvis or some audio signal)

I think the flow would be like this:
TTS > Audio2Face > UE

We currently don’t have an API to take TTS audio stream to Audio2Face as input (what we did on Misty) but it is not hard to do actually with existing Python audio libraries. The more people asking for this, the more we will consider releasing something like this so is easier for people to connect other audio input to Audio2Face.

The A2F > UE part can first try the Omniverse UE connector. It should be able to get the results out to UE live.

1 Like