Audio2Face to FACS

The facial motion that Audio2Face puts out is really impressive, but it would be hard to use in a video game pipeline because it’s exporting a mesh cache instead of FACS slider data.

I’m curious if there’s any way to process/export the motion to FACS sliders or if anything like that is planned for the future?

I guess if you have a FACS rig for the example face, you could maybe train a neural network to do random FACS slider values on the rig, then check the delta between the vertex positions of the rig and the vertex positions of the mesh cache. That might be one way to do it.

Hi, Thanks for the question and interest in A2F. Currently we do not support Blendshapes. But Blendshape support is planned and will be available in the future.

1 Like

@Danlowlows , just curious what usage you are thinking of, and if you have specific pipeline in mind?

I am aware of gaming pipeline with joints or blendshape approaches. I like to know more specific in the use case you are thinking of. A2F actually supports blendshape fine. As Ronan mention is something we are going to add. We also have done some tests with joint constraints to the cache surface of the game rig to create animation that way. (Specific for game use context) and it worked very well.

There are other ideas we have how to convert this data you see to any FACS rig anyone has. The support of blendshape alone can get tricky as everyone may have a slightly different rig. So we want this easy for anyone. Stay tuned for more updates in this area. Let us know if you have other feedback.

Cheers.

@siyuen Hi there! Thanks for the reply.

What I’d like to be able to do is export motion data that can then be used on a game rig. The best format for that, at least for most modern facial rigs for games, is float channels that represent different FACS shapes. Mesh caches are not particularly useful as a data format for games (at least at runtime), for a few reasons…

  1. Vertex motion is saving both motion and shape together, which makes it very mesh specific: You can’t port it between characters. One of the nice things about FACS is that it’s using generic descriptors like “the jaw is 48% open”, “the middle of the left eyebrow is 20% raised”, etc. Those kind of descriptors separate out the description of the motion from the shape of the face: It leaves the specifics of what to do with those descriptors, to each individual facial rig. That’s a lot more useful for game teams.

  2. Vertex motion is difficult to edit. Say I really like the lip sync from a capture, but want to make the character smile a bit more, or raise their eyebrows more; that’s very hard to do with a mesh cache. With blendshapes, bones, FACS rigs, or really anything driven by float channels, you have controls in place that an animator can use, so it’s a lot easier to make those adjustments. This includes runtime blending of animations e.g. having the lips run on a separate animation layer to the eyes, so you can dynamically control eye look at direction, or the emotion in the brow, separate from what the person is saying.

  3. Mesh caches are comparatively heavy, memory wise, so aren’t really as suitable as a runtime animation file format.

I appreciate your point about people having different face rigs, but once the data is in a float channel format, it’s fairly straightforward for a technical animator at a studio to write a script to convert the data if you have access to both the source and target rig. You set each channel on the source rig, one at a time, to 100%, then animate the target rig to match that shape, and then save out a mapping of the values for that shape (e.g. 100% on this channel = this combination of channel values on the other rig).

Once you have that mapping you can process all future captures very quickly. This approach isn’t possible with a mesh cache though.

If A2F supports blendshapes, I think allowing users to export the blendshape channel data, would add a huge amount of value. It would also be very helpful to include the basic blendshape rig with no motion on it, as something like an .fbx file, so that users have the reference for what each individual channel is doing, so they can match the shapes and create a mapping for the data. It wouldn’t need any complex controls or anything like that; just the float channels for each of the shapes.

Hope that all makes sense? Thanks again for taking the time to follow up.

6 Likes

Totally agree and there are other live puppetry use cases where text-to-speech to A2F to blendshape to control rig is highly desirable and could significantly reduce the time and cost overhead of experimenting with different approaches.

1 Like

its not just games pipelines - ‘blackbox’ mesh only deformation to drive a performance is problematic for ANY cg pipeline where you can’t easily edit the results - especially for something thats as liable to need ‘direction’ as a facial performance :)

This is even an issue with dynamic sims in general and lack of ‘art direct-ability’ but at least on the ‘simulating physics’ side, the price paid in terms of flexibility for what you gain in otherwise impossible to create complexity is worth it. Face pipelines, you either need them to be VERY VERY good out the box or you need the ability to tweak (though that is not impossible to setup on top of a cache, its much easier to build onto a rig)

1 Like

Totally agree, Geometry cache is pretty difficult to handle when you combine motions from several sources. Another good option would be using Apple ARKit blendshapes, common to a lot of pipelines

Thank you for all the feedback.

Some thing that might be of interest for people here are, we are soon going to release a Blendshape Solve option for Audio2Face, it will come out as 2021.3

This will allow users to have a live conversion from the final A2F result, to your custom Blendshape rig side by side. (only support blendshape at the moment)
So you can see and tune the result live.

Then you can export the blendshape keyframe information from the solve to your own software package.

Stay tuned for this update.

1 Like