Hello,
I’m currently working on building an interactive virtual assistant using Audio2Face and trying to integrate a text-to-speech (TTS) service into the pipeline. However, I’ve encountered an issue with streaming audio data format compatibility within Audio2face.
The TTS service I’m using returns PCM audio format chunks of data, which appear as binary chunks, for example: “x00\x07\x00\x07\x00\x07\x00\x04\x00\x05\x00\x05”. To integrate this data into the streaming client of Audio2face, I’ve been converting these binary chunks into the required format and then streaming them as follows:
async for chunk in audio_stream:
audio_data = np.frombuffer(chunk, dtype=np.int16)
audio_data_float = audio_data.astype(np.float32) / 32767.0
stream_client_a2f.push_audio_track_stream(audio_data_float, 22050, “/World/audio2face/PlayerStreaming”)
However, this conversion process seems to be causing discontinuities in the streamed audio within Audio2face. Upon examining the provided proto file by NVIDIA, I noticed that the audio_data
field in the PushAudioStreamRequest
message is defined as bytes
. I attempted to modify it to string
and regenerate the audio2face_pb2
and audio2face_pb2_grpc
files, but unfortunately, this didn’t resolve the issue. It appears that the audio player inside Audio2face only works with bytes.
Could you please suggest any workarounds to address this issue. Any insights or assistance would be highly appreciated.
Thank you.