Hello, I am using the below script:
import time
import torch
from typing import Any
import numpy as np
import torchaudio
import gi
gi.require_version('Gst', '1.0')
from gstreamer import GstApp, Gst, GObject
GObject.threads_init()
Gst.init(None)
count = 0
frames = []
def on_buffer(sink: GstApp.AppSink) -> Gst.FlowReturn:
"""Callback on 'new-sample' signal"""
global count
global frames
sample = sink.emit("pull-sample") # Gst.Sample
if isinstance(sample, Gst.Sample):
buffer = sample.get_buffer()
buffer_size = buffer.get_size()
data = buffer.extract_dup(0, buffer_size)
#format, layout, rate, channels, channel-mask,
if count == 1000:
print(count)
frames = torch.stack(frames).view(-1).unsqueeze(0)
torchaudio.save("test.wav",frames,44100)
frames = []
count = 0
else:
frames.append(torch.from_numpy(np.fromstring(data, dtype=np.float32)))
count+=1
return Gst.FlowReturn.OK
return Gst.FlowReturn.ERROR
#alsasrc device=hw:MS2109,0, pulsesrc device=alsa_input.usb-MACROSILICON_USB_Video-02.analog-stereo
command = "alsasrc device=hw:MS2109,0 ! appsink emit-signals=True drop=True"
pipeline = Gst.parse_launch(command)
appsink = pipeline.children[0] # get AppSink
appsink.connect("new-sample", on_buffer)
pipeline.set_state(Gst.State.PLAYING)
time.sleep(200)
To capture audio data using appsink and save it using torchaudio. I have tried both pulsesrc and alsasrc but when I save the audio it sounds very choppy. My theory is that the sample rate isn’t aligned so my question is: how can I set the sample rate in this pipeline alsasrc device=hw:MS2109,0 ! appsink emit-signals=True drop=True
. I am also open for any suggestions to improve the process in general, I just need to capture audio data and convert it to pytorch tensors so I can pass it to a pytorch model.
Edit 1: I would like to have a sample rate of 44.1k.
I have edited /etc/asound.conf
and added:
pcm.!default {
type plug
slave {
pcm "hw:MS2109,0"
rate 44100
}
hint.description "default Soundcard"
}
in order to change the sample rate but I have no way of making sure that it is uing the correct sample rate.
Edit 2: I have changed the pipeline to:
"pulsesrc device=alsa_input.usb-MACROSILICON_USB_Video-02.analog-stereo ! audioconvert ! tee name=audioTee \
audioTee. ! queue ! wavenc ! filesink location=test2.wav \
audioTee. ! queue ! appsink emit-signals=True drop=True"
So I can make sure that whether the problem from the soundcard itself or is it the way I am handling the captured data after and using filesink I save the data into the file as expected and the sound is crystal clear but when I save the data using torchaudio from the line: torchaudio.save("test.wav",frames,44100)
the generated audio file is very noisy.