Audio2Face Streaming Proto File

toufic.kashmar · May 14, 2024, 6:08am

Hello,

I’m currently working on building an interactive virtual assistant using Audio2Face and trying to integrate a text-to-speech (TTS) service into the pipeline. However, I’ve encountered an issue with streaming audio data format compatibility within Audio2face.

The TTS service I’m using returns PCM audio format chunks of data, which appear as binary chunks, for example: “x00\x07\x00\x07\x00\x07\x00\x04\x00\x05\x00\x05”. To integrate this data into the streaming client of Audio2face, I’ve been converting these binary chunks into the required format and then streaming them as follows:

async for chunk in audio_stream:
audio_data = np.frombuffer(chunk, dtype=np.int16)
audio_data_float = audio_data.astype(np.float32) / 32767.0
stream_client_a2f.push_audio_track_stream(audio_data_float, 22050, “/World/audio2face/PlayerStreaming”)

However, this conversion process seems to be causing discontinuities in the streamed audio within Audio2face. Upon examining the provided proto file by NVIDIA, I noticed that the audio_data field in the PushAudioStreamRequest message is defined as bytes. I attempted to modify it to string and regenerate the audio2face_pb2 and audio2face_pb2_grpc files, but unfortunately, this didn’t resolve the issue. It appears that the audio player inside Audio2face only works with bytes.

Could you please suggest any workarounds to address this issue. Any insights or assistance would be highly appreciated.

Thank you.

Richard3D · May 17, 2024, 2:13pm

@Ehsan.HM

sparkalive · May 22, 2024, 6:23pm

Hello,

Giving a numpy float array as audio_data should work with the sample client function, push_audio_track_stream(). I would suggest making sure the source audio is properly converted to a numpy array. Your data looks like big-endian, but np.frombuffer() may read it as little-endian. Could you check the values in audio_data_float is what you expected?

> a1 = np.array([1, 0, 1, 0, 1, 0], dtype=np.int16)
> a1.tobytes()
b'\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00'

> a2 = np.array([1, 0, 1, 0, 1, 0], dtype=np.dtype('>i2'))
> a2.tobytes()
b'\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00'

system · June 5, 2024, 6:23pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issue with data type from the Riva client to Audio2face(2022.2.1) Audio2Face (closed)	1	564	October 17, 2023
Streaming audio data format queston Audio2Face (closed)	2	279	June 3, 2024
Distorted audio stream from python to stream player Audio2Face (closed)	2	390	May 1, 2024
Can Audio2Face input a streamed audio via some port as if PCM or similar instead of using processed WAV files Audio2Face (closed)	1	772	January 24, 2022
On the Noise Problem in Auido2face Flow Mode Audio2Face (closed)	9	814	April 28, 2024
Client for Streaming Audio Player Audio2Face (closed)	7	1526	January 3, 2023
Can Audio2Face input a streamed audio via some port as if PCM or similar instead of using processed WAV files Audio2Face (closed)	1	379	November 14, 2023
"Received message larger than max (N vs. 4194304)" When Pushing Audio (not audio size dependent) Audio2Face (closed)	1	574	February 28, 2023
How can I use audio2face and audio2gesture together with a audio streaming player？ Audio2Face (closed)	4	1327	March 14, 2023
Audio2face application is not working with Riva 2.2 and above Audio2Face (closed)	8	748	August 17, 2022

Audio2Face Streaming Proto File

Related topics