Hey @siyuen, we’re also interested in this! We’re looking at generating lip sync for a character in Unreal Engine, with the TTS coming from an external source. We’d like to take that TTS and generate (and play) the resulting animation in realtime on the character as soon after the TTS is received as possible.
What do you think the flow for this would look like in Unreal Engine? Would it be something like this:
- Unreal receives TTS from external source (e.g. Jarvis)
- Unreal somehow uploads it to Omniverse?
- Unreal somehow instructs Audio2Face to generate animation (USD) from the uploaded TTS
- Unreal somehow downloads the generated USD from Omniverse?
- Unreal plays the animation
Is this the right kind of flow? Can Audio2Face somehow run natively in Unreal or outside of the Omniverse application?
You also mentioned about a custom Jarvis client to stream the output to Audio2Face. I can’t really see where the hooks would lie for this, is this something I can currently do by tinkering with Omniverse Kit? Or is there an SDK somewhere I’m missing?