How does the Audio2Face neural network work?

How exactly is the A2F neural network working. Is it using some kind of deep learning and was fed to understand how the pitch or frequency of the voice is working? Does it have a specific recognition system for phoneme?
On the same note its able to auto emote. Although the results may vary it comes quite close some of the time.

I would be very curious. I am currently writing on a thesis about audio2face and i wanted to at least note for a basic understanding how the network is functioning. Only thing i could find is, that A2F is using the TensorRT Engine.

1 Like

Hello @daniel.fleger1! I will pass this along to the team in case they have links to share with you. If you are interested, you can take a look at all of our research publications here: https://research.nvidia.com/publications. Here are a few links that may interest you:

I am also interested in the broader details of how audio2face works. The lack of this knowledge was recently a cause of some confusion on the Virtual Beings Facebook group…

audio2face-vb1

A little unrelated, but I believe Audio2Face can be used in real-time mode if user chooses to.

1 Like

(What you mean is probably the live-mode in voice recording. The streaming audio player just connects to a tts-tool which synthesizes text to voice)