On the Noise Problem in Auido2face Flow Mode

I saw the example of streaming push test_client.py in the official example and imitated it, hoping to push the local MP3 or PCM file format. I used some Python libraries to organize the read data into data similar to the official example. However, after pushing it to the audio2face, the broadcast content was correct, but there was obvious noise and the sound quality was not clear enough. I hope it can help solve this problem

@2817359766 i am just another user and unsure about the flow mode you mentioned; that said, there were a couple threads related to livelink/streaming situation where noises were reported that might be relevant:

have you also tried .WAV format as well?

I have tried WAV because the. WAV file is used in test_client.py, which is not a problem. However, the audio file I want to stream now is a PCM or MP3 file, and it cannot be converted to a WAV file in advance. What should I do

unfortunately, that is a bit out of my depth. perhaps if you were to elaborate your process in detail, the mods/devs would be able to offer a suitable solution for you.

in the meantime, i recall there was another thread in which mp3 format was brought up and discussed. maybe there are some additional insights there:

1 Like

This is the official demo I modified to read WAV files and stream to Audio2face. I changed it to reading my own MP3 files, and then there will be noise in Audio2face, similar to the sound of listening to radio in the 20th century. Could you please help me check where the problem is
import sys
import time
import audio2face_pb2
import audio2face_pb2_grpc
import grpc
import numpy as np
import soundfile
from pydub import AudioSegment

def push_audio_track_stream(url, audio_data, samplerate, instance_name):
chunk_size = samplerate // 10 # ADJUST
sleep_between_chunks = 0.04 # ADJUST
block_until_playback_is_finished = True # ADJUST

with grpc.insecure_channel(url) as channel:
    print("Channel creadted")
    stub = audio2face_pb2_grpc.Audio2FaceStub(channel)

    def make_generator():
        start_marker = audio2face_pb2.PushAudioRequestStart(
            samplerate=samplerate,
            instance_name=instance_name,
            block_until_playback_is_finished=block_until_playback_is_finished,
        )
        # At first, we send a message with start_marker
        yield audio2face_pb2.PushAudioStreamRequest(start_marker=start_marker)

        float32_array = np.frombuffer(audio_data, dtype=np.int16).astype(np.float32)
        float32_byte_string = float32_array.tobytes()

        yield audio2face_pb2.PushAudioStreamRequest(audio_data=float32_byte_string)

    request_generator = make_generator()
    print("Sending audio data...")
    response = stub.PushAudioStream(request_generator)
    if response.success:
        print("SUCCESS")
    else:
        print(f"ERROR: {response.message}")
print("Channel closed")

def main():

sleep_time = 2.0  # ADJUST
url = "localhost:50051"  # ADJUST
instance_name = "/World/audio2face/PlayerStreaming"
audio_data = AudioSegment.from_file("xfdc.mp3")
raw_audio_data = audio_data.raw_data
print(raw_audio_data)

print(f"Sleeping for {sleep_time} seconds")
time.sleep(sleep_time)
push_audio_track_stream(url, raw_audio_data, 16000, instance_name)

if name == “main”:
main()

thanks for sharing! i’ll defer to more knowledgeable users and/or the mods from here 🙂

1 Like

@2817359766 It seems you just need to scale the raw_audio_data. Can you try this?

import sys
import time
import audio2face_pb2
import audio2face_pb2_grpc
import grpc
import numpy as np
import soundfile
from pydub import AudioSegment


def push_audio_track_stream(url, audio_data, samplerate, instance_name):
    chunk_size = samplerate // 10  # ADJUST
    sleep_between_chunks = 0.04  # ADJUST
    block_until_playback_is_finished = True  # ADJUST

    with grpc.insecure_channel(url) as channel:
        print("Channel created")
        stub = audio2face_pb2_grpc.Audio2FaceStub(channel)

        def make_generator():
            start_marker = audio2face_pb2.PushAudioRequestStart(
                samplerate=samplerate,
                instance_name=instance_name,
                block_until_playback_is_finished=block_until_playback_is_finished,
            )
            # At first, we send a message with start_marker
            yield audio2face_pb2.PushAudioStreamRequest(start_marker=start_marker)
            # Then we send messages with audio_data
            for i in range(len(audio_data) // chunk_size + 1):
                time.sleep(sleep_between_chunks)
                chunk = audio_data[i * chunk_size : i * chunk_size + chunk_size]                
                yield audio2face_pb2.PushAudioStreamRequest(audio_data=chunk.astype(np.float32).tobytes())

        request_generator = make_generator()
        print("Sending audio data...")
        response = stub.PushAudioStream(request_generator)
        if response.success:
            print("SUCCESS")
        else:
            print(f"ERROR: {response.message}")
    print("Channel closed")


def main():
    sleep_time = 2.0  # ADJUST
    url = "localhost:50051"  # ADJUST
    instance_name = "/World/audio2face/PlayerStreaming"
    audio_data = AudioSegment.from_file("audio_file.wav")
    raw_audio_data = np.array(audio_data.get_array_of_samples(), dtype=np.float32) / 32768.0 # 2 ^ 15

    print(f"Sleeping for {sleep_time} seconds")
    time.sleep(sleep_time)
    push_audio_track_stream(url, raw_audio_data, audio_data.frame_rate, instance_name)


if __name__ == "__main__":
    main()

1 Like

I also found a similar solution yesterday. When I was processing the audio encoding format, I missed the step of /32768.0. Thank you for your reply. The problem has been resolved

@Ehsan.HM i am not an audio guy, so pardon my rookie question - what does the division of 2^15 do? is that a standard conversion from audio data to raw audio data?

This normalizes a 16-bit audio range to be between -1.0 and 1.0. Please note that 2 ^ 15 = 32768

  • For 16-bit PCM audio, the range of values is from -32768 to 32767.
  • For 8-bit PCM audio, the range of values is from -128 to 127.
1 Like