Integrating ElevenLabs API with Audio2Face: Issues with Sample Rate and Buffer Size

Hello everyone,

I am reaching out for some expertise on a technical challenge I am currently facing. I am working on integrating the ElevenLabs API with Audio2Face, but I’m encountering difficulties related to the sample rate and buffer size.

Has anyone here attempted to combine these two tools before? If so, did you experience any issues with sample rate or buffer size? I would greatly appreciate any advice or solutions to overcome these hurdles.

Here are more details on the specific problems I am facing:

  • Inconsistencies in sample rate between the ElevenLabs API and Audio2Face, causing compatibility issues.
  • Difficulties with the buffer size that lead to delays and interruptions in audio processing.

If anyone has suggestions, similar experiences to share, or can point me towards useful resources, it would be extremely helpful.

Thank you in advance for your help and feedback.

I’m not familiar with ElevenLabs. But took a quick look and it seems it only exports audio as .mp3 format, right? But Audio2Face only works with .wav.

As a test I created an mp3 using ElevenLabs, then converted it to wav and tested it in A2F and it seems to work as expected. Do you have an audio file we can test on our end?

we want to use the ElevenLabs TextToSpeech Websocket API, to send a streamed audio stream

If you already have the .wav file and would like to send it to Audio2Face using command line, take a look at Overview of Streaming Audio Player in Omniverse Audio2Face - YouTube