Open Mouth Problem

Hello:)
I’ve noticed a certain issue in the audio2face program - I’ll call it the Open Mouth Problem :) After launching, even without starting audio playback, and during pauses between speeches when the character should close its mouth, it stubbornly remains slightly open, which doesn’t look the best. It can be corrected externally, but it raises the question of why such a problem occurs? Why, if the audio signal level is zero or very minimal, is the blend shape for open mouth set to a value greater than zero? Is there a solution to this problem, or does it stem from my lack of knowledge?

Best regards
Chris

Does tweaking Lip Open Offset help?

Hmm, this isn’t a solution… I can’t manually adjust such parameters for every character because I’m building a fairly complex generic avatar system where everything is supposed to work as automatically as possible. And I assume that if there is no sound, the lip sync values should be at 0 (I’m not talking about emotions, expressions), I can’t go into the interface and manually tweak something for every character… especially not knowing why it’s like that? Logically, if the audio is at 0, the mouth should be at zero and that’s it… otherwise, it’s a bug. This isn’t a situation like with face capture calibration where every face might be slightly different, caught differently by the system and you have to do calibration in some neutral pose. Here, we have a neutral pose by definition in the model. Therefore, in my opinion, this is a bug… or a problem with the AI model that can’t zero out the state at audio level 0

Do you have any noise in your audio file by any chance? I can see that Mark’s lips are completely closed when I drag the timeline to where there’s no sound.

It’s the same for me when I play around in the editor. The voice is generated by ElevenLabs in good quality without any noise. And I see that in the editor it’s okay. But… in runtime, when transmitting through LiveLink to Unreal Engine, it’s not. Maybe it’s some problem on the LiveLink side?

You’re right. We’re aware of lips issues when it comes to M, B and P sounds and our engineering team is currently working on it.

Oh, that’s at least good news that there is a problem but it’s already pinpointed and that I’m not hallucinating like some AI :) Thank you for your help and for letting me know that the struggle continues on the topic. Warm regards to you and the whole team… I’m doing these longer tests and really, your lipsync works super, there’s nothing like it in the world… I’m a bit worried though that it consumes a lot of computing power and I think in some perspective I will have to allocate a separate computer for calculations and another for generating character. This is not good from a business point of view :)

Best regards
Chris

Hello,
I have a gentle question about when we can expect some software updates because quite a few things have accumulated, and it would be good to have at least a vision of when we can expect some new features :).

Best regards
Chris

Unfortunately, we do not have an ETA for the next release.

I understand, tough luck, we must wait patiently and tap our feet :)

Chris

Hello,
is the problem with data transmission errors via LiveLink in runtime on the side of sending data from Audio2Face or receiving data in the LiveLink plugin in Unreal Engine?

I’m not quite sure. Our engineering team is working on this at the moment and hopefully will have a solution soon.

Very creative Chris :)

Yes This has come up a few times in the past and the best solution so far is to create a proper MouthClose shape in the ArKit Blendshapes.

The MouthClose shape looks weird when triggered alone, but when combined with JawOpen, It should create a face shape with the jaw that looks exactly like JawOpen + lips that are touching. (Just like if you keep your lips together and open your jaw as much as you can)

The trick is to create that weird looking MouthClose shape which can be achieved by subtracting The final sculpted shape (JawOpen+MouthClose) from JawOpen.

Hello,
thank you very much for the response :) I’m really focused on achieving perfect lip-syncing, and unlike how facial expressions can be controlled with an iPhone, where I don’t observe such weird effects with the jaw, here they occur… There’s also the problem with the opening mouths, which I try to minimize during speech and smoothly turn off the LiveLink source at the end… but it’s like fixing something with tape… The error should be corrected at the source, on the program side, which I’m eagerly waiting for… Me and my Genea :)

Chris

Glad to hear they are working on a solution. I wish there was a panel on the timeline like IClone has.

No matter how good the AI is, people may want to override the behavior for various reasons.

When you need to override a motion of their Acculips plugin.

Hi:),

This is like another step, adding additional functionality that makes the system more flexible… although in this case, as a devout follower of SOLID principles and Design Patterns, I believe the main functionality of A2F should be correctly providing blendshape values for specific presets like ARKit, FACS (FACS – cheat sheet – Face the FACS), etc., how a particular target looks should be on the recipient’s side of blendshape coefficients. But for now, I’m battling with there being fundamental errors in transmitting the correct blendshape values. If we establish that blendshape data will be in the ARKit standard… and the model works properly with iPhone’s ARKit, without any errors, then the data sent from A2F MUST also work correctly… that’s the principle of operating a given interface… if it doesn’t work correctly, there is an error in the transmitted data. If every Metahuman character, in cooperation with the iPhone, showed errors in facial expression, then it would be obvious that there are errors in morphing targets, missing morphs, incorrectly interpreted, retargeted data, etc… but everything works correctly… so if I change in the communication interface (which is LiveLink) the “provider” of blendshape coefficients data, without any corrections, I should have correctly working expressions… otherwise, the idea of interfaces loses its meaning for me.

Chris

I imagine a story like this… I decided to buy a new beautiful car, and I chose an Audi A7… so I go to the dealership with cash in pocket, extremely excited about the purchase. In the showroom, the salesperson shows me the car… inside, it’s comfortable, with large seats, plenty of space, a big trunk, well-organized cockpit… everything is top-notch. Great, now it’s time for a test drive… I get in and close the door… bam! The door doesn’t close? Hmm, strange… I try again, nothing… I take a closer look… the door lacks a lock… surprised, I ask the salesperson what’s going on… and he, all smiles, says that the engineers were so rushed that they forgot about the door lock… but it’s nothing, the car is 99% perfected down to the last detail, and it’s just a little lock in the door… but they provide a hemp rope with which you can tie the door so it won’t open. And there’s a kit available on the Audi website for you to install the lock yourself. It’s very simple… you have to drill holes in a few places, weld the mounting, and everything works… one day’s work. So, shall we write up the invoice? :) You can probably imagine the expression on my face and my eyes bulging out in disbelief :).

A similar situation is here… we have an application, a project dedicated to ARKit and Metahuman, there’s a LiveLink plugin with settings for Metahuman. It seems everything is there but it doesn’t work completely… it works 90% and now you’re left to figure it out, messing with targets, etc., breaking your head over why it doesn’t work perfectly. I don’t understand what the issue is.

The solution must work 100% and not 90% because that 10% prevents practical use.

Chris:)

an interesting analogy. what if someone were to point to the banner next to the Audi or the fine print in the sales contract that states “this car is close to completion, but test drive today and see all the features for free!”, what would you do? or, would you end up making the choice to drive that car out of the dealer’s lot despite knowing about the missing door lock because you loved everything else about the Audi? or, by driving it out of the lot today, you know you could already shorten your commute to work by half instead of having to ride your bike to the bus station and taking the crowded bus for 40 minutes everyday?

alas, i am just another OV user and agree with your sentiment to some extent. i am sure the devs for A2F (and other OV apps, i presume) wants the app to be perfect, but my unpopular opinion is that software development takes time and has lots of moving parts. users like us would be helping them get to that point quicker but will require patience. having to figure out workaround is an option, and perhaps worth it, in the interim. without such app, we’d all have to become riggers and animators or learn how to parse mocap data to even make realistic facial animation possible.

HI:),

I must admit I expected just such an interpretation… after all, the software is free and in beta version… eternal beta to have an excuse for something not working… and there’s also the matter of being free… in life, nothing is for free… for free, you can get punched in a shady district… here, Nvidia enters into a business contract with users… we provide you with tools, and you buy our devices on which these tools can be run and utilized… everyone buying Nvidia cards is paying for the teams working on these solutions. People will be more willing to buy if they are sure that the software works. Google operates the same way… supposedly, the tools are free, but we all pay for these tools by supplying our data, information, fueling big data, and targeted advertising systems… nothing here is for free.

It’s also understood that such software has many threads and needs to meet various user expectations. Some want this, others that, and there are hundreds of different cases… it’s not possible to cover everything quickly… but :)… since a certain milestone has been defined (let’s call it that) for handling the Metahuman system through LiveLink. it seems most sensible to close this topic in such a way that it can be used rather than leaving it at 90% or maybe 99%, leaving errors, and jumping to the next task…

After all, I have repeatedly praised and admired this system for how it works and that it gives such good results… but suddenly, when it comes time to move from fun and tests to real implementation, it turns out there are errors and problems. That’s why I think a strategy of closing a given stage to be “production ready” like Epic does with Unreal Engine is better than leaving something unfinished.

Ultimately, I know this is just my rambling that doesn’t have any real impact… but that’s what forums are for, like the ancient Roman Forum where different views clashed. And that’s what it’s all about, to have a discussion :)

Best regards to everyone :) Especially the A2F team who has to listen to such grumbling :)