Can you add an [optional] auxiliary time-stamped text that can be used to assist in automatically cutting the audio to avoid false animations caused by background noise?
Because some long audio has non-speaking parts, there may be some noise, audiotoface will be animated in such cases, timestamp text can cut and mask the audio.
If you have timestamped text, you can omit the following steps:
Cut the audio manually into small segments, then enter each segment into audioToFace, then set the parameters one by one, then export them one by one, and finally merge them together