Audio2Face Streaming Proto File


I’m currently working on building an interactive virtual assistant using Audio2Face and trying to integrate a text-to-speech (TTS) service into the pipeline. However, I’ve encountered an issue with streaming audio data format compatibility within Audio2face.

The TTS service I’m using returns PCM audio format chunks of data, which appear as binary chunks, for example: “x00\x07\x00\x07\x00\x07\x00\x04\x00\x05\x00\x05”. To integrate this data into the streaming client of Audio2face, I’ve been converting these binary chunks into the required format and then streaming them as follows:

async for chunk in audio_stream:
audio_data = np.frombuffer(chunk, dtype=np.int16)
audio_data_float = audio_data.astype(np.float32) / 32767.0
stream_client_a2f.push_audio_track_stream(audio_data_float, 22050, “/World/audio2face/PlayerStreaming”)

However, this conversion process seems to be causing discontinuities in the streamed audio within Audio2face. Upon examining the provided proto file by NVIDIA, I noticed that the audio_data field in the PushAudioStreamRequest message is defined as bytes. I attempted to modify it to string and regenerate the audio2face_pb2 and audio2face_pb2_grpc files, but unfortunately, this didn’t resolve the issue. It appears that the audio player inside Audio2face only works with bytes.

Could you please suggest any workarounds to address this issue. Any insights or assistance would be highly appreciated.

Thank you.



Giving a numpy float array as audio_data should work with the sample client function, push_audio_track_stream(). I would suggest making sure the source audio is properly converted to a numpy array. Your data looks like big-endian, but np.frombuffer() may read it as little-endian. Could you check the values in audio_data_float is what you expected?

> a1 = np.array([1, 0, 1, 0, 1, 0], dtype=np.int16)
> a1.tobytes()

> a2 = np.array([1, 0, 1, 0, 1, 0], dtype=np.dtype('>i2'))
> a2.tobytes()