Audio input and output with noise cancellation

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi,

I need help with 3 things.
1 read real time audio from deepstream
2 load a noise cancellation model.
3 output the graph as input graph, and after noise cancellation graph.

There is no proper documentation for the same

What kind of realtime audio?

What is the input and output of the model?

What do you mean by this?

I mean, when we read it through the mic.
like for camera, we read it from the rtsp for live feed

a pretrained noise suppression model, if you could guide me.
I am not sure what do you mean by input, and output model

I want to plot a graph for the input audio with noise, and after the noise cancellation how the graph looks.
Like a before noise cancellation, and after noise cancellation.

You can use alsasrc (gstreamer.freedesktop.org) to read the ALSA device.

This is my pipeline string
pipeline_str = “”"

uridecodebin uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sonyc_mixed_audio.wav name=source_0 ! queue ! \

mux.sink_0 nvstreammux name=mux batch-size=1 ! \

nvstreamdemux name=demux \

demux.src_0 ! audioconvert ! audioresample ! alsasink async=false

“”"

pipeline = Gst.parse_launch(pipeline_str)
gi.repository.GLib.GError: gst_parse_error: could not link demux to audioconvert0 (3)
This is the error I am getting

You need to describe the model’s input and output data.

E.G. In the sample /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-audio sample, the model sonyc_audio_classify.onnx is used. The model’s input is the transformed log mel spectogram data which is calculated from the original PCM( How to feed raw audio into the model by nvinferaudio? ), and the output is softmax data.

Do you mean you want to get the PCM data from the DeepStream pipeline?

There is alsasrc sample in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-audio sample

This is my pipeline
import gi
gi.require_version(‘Gst’, ‘1.0’)

from gi.repository import Gst

from gi.repository import Gst, GObject

Initialize GStreamer

Gst.init(None)

Create elements

filesrc = Gst.ElementFactory.make(“filesrc”, “source”)
filesrc.set_property(“location”, “/opt/nvidia/deepstream/deepstream/samples/streams/sonyc_mixed_audio.wav”)

decodebin = Gst.ElementFactory.make(“decodebin”, “decoder”)

audioconvert = Gst.ElementFactory.make(“audioconvert”, “convert”)

audioresample = Gst.ElementFactory.make(“audioresample”, “resample”)

queue = Gst.ElementFactory.make(“queue”, “queue”)

nvstreammux = Gst.ElementFactory.make(“nvstreammux”, “mux”)
nvstreammux.set_property(“batch-size”, 1)
nvstreammux.set_property(“width”, 1280)
nvstreammux.set_property(“height”, 720)
nvstreammux.set_property(“live-source”, False)
nvstreammux.set_property(“gpu-id”, 0)

nvinferaudio = Gst.ElementFactory.make(“nvinferaudio”, “audio_infer”)
nvinferaudio.set_property(“config-file-path”, “/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-audio/configs/config_infer_audio_sonyc2.txt”)
nvinferaudio.set_property(“batch-size”, 1)
nvinferaudio.set_property(“gpu-id”, 0)

autoaudiosink = Gst.ElementFactory.make(“autoaudiosink”, “audio_sink”)

Create a GStreamer pipeline

pipeline = Gst.Pipeline.new(“audio_pipeline”)

Add elements to the pipeline

pipeline.add(filesrc)
pipeline.add(decodebin)
pipeline.add(audioconvert)
pipeline.add(audioresample)
pipeline.add(queue)
pipeline.add(nvstreammux)
pipeline.add(nvinferaudio)
pipeline.add(autoaudiosink)

Link elements (decodebin will be linked dynamically, the rest are linked statically)

filesrc.link(decodebin)

Dynamic linking for decodebin (connecting decodebin to audioconvert)

def on_pad_added(decodebin, pad):
caps = pad.query_caps(None)
if caps.to_string().startswith(“audio/x-raw”):
pad.link(audioconvert.get_static_pad(“sink”))

decodebin.connect(“pad-added”, on_pad_added)

decodebin.link(audioconvert)

Link the remaining elements

audioconvert.link(audioresample)
audioresample.link(queue)

queue_src_pad = queue.get_static_pad(“src”)

mux_sink_pad = nvstreammux.get_request_pad(“sink_0”)

queue_src_pad.link(mux_sink_pad)

queue.link(nvstreammux)
nvstreammux.link(nvinferaudio)
nvinferaudio.link(autoaudiosink)

pipeline.set_state(Gst.State.PLAYING)

Run the pipeline

try:
bus = pipeline.get_bus()
msg = bus.timed_pop_filtered(Gst.CLOCK_TIME_NONE, Gst.MessageType.EOS | Gst.MessageType.ERROR)

if msg:
    if msg.type == Gst.MessageType.ERROR:
        err, debug = msg.parse_error()
        print(f"Error: {err}, {debug}")

finally:
# Stop the pipeline
pipeline.set_state(Gst.State.NULL)

Can you check if this is fine?

This is my infer config file
################################################################################

SPDX-FileCopyrightText: Copyright (c) 2020-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

SPDX-License-Identifier: LicenseRef-NvidiaProprietary

NVIDIA CORPORATION, its affiliates and licensors retain all intellectual

property and proprietary rights in and to this material, related

documentation and any modifications thereto. Any use, reproduction,

disclosure or distribution of this material and related documentation

without an express license agreement from NVIDIA CORPORATION or

its affiliates is strictly prohibited.

################################################################################

Following properties are mandatory when engine files are not specified:

int8-calib-file(Only in INT8)

Caffemodel mandatory properties: model-file, proto-file, output-blob-names

UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names

ONNX: onnx-file

Mandatory properties for detectors:

num-detected-classes

Mandatory properties for classifiers:

classifier-threshold, is-classifier

Optional properties for classifiers:

classifier-async-mode(Secondary mode only, Default=false)

Following properties are always recommended:

batch-size(Default=1)

Other optional properties:

net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),

mean-file, gie-unique-id(Default=0), offsets, gie-mode (Default=1 i.e. primary),

custom-lib-path, network-mode(Default=0 i.e FP32)

The values in the config file are overridden by values set through GObject

properties.

[property]
gpu-id=0
net-scale-factor=1
onnx-file=…/…/…/…/…/samples/models/SONYC_Audio_Classifier/sonyc_audio_classify.onnx
model-engine-file=…/…/…/…/…/samples/models/SONYC_Audio_Classifier/sonyc_audio_classify.onnx_b2_gpu0_fp32.engine
labelfile-path=…/…/…/…/…/samples/models/SONYC_Audio_Classifier/audio_labels.txt
batch-size=2

0=FP32, 1=INT8, 2=FP16 mode

network-mode=0
num-detected-classes=31
gie-unique-id=1
output-blob-names=output1
network-type=1
parse-classifier-func-name=NvDsInferParseCustomAudio
custom-lib-path=/opt/nvidia/deepstream/deepstream/lib/libnvds_infer_custom_parser_audio.so

[class-attrs-all]
threshold=0.4

When I run the above code, I am getting this error:::
0:00:01.132954932 4189 0x5a876077b670 WARN pulse pulsesink.c:615:gst_pulseringbuffer_open_device:<audio_sink-actual-sink-pulse> error: Failed to connect: Connection refused
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT input.1 1x635x128
1 OUTPUT kFLOAT output1 31

0:00:05.621997161 4189 0x5a876077b670 WARN nvinferaudio gstnvinferaudio.cpp:284:gst_nvinferaudio_start:<audio_infer> error: Failed to create audio transform
0:00:05.622599575 4189 0x5a876077b670 WARN basesrc gstbasesrc.c:3688:gst_base_src_start_complete: pad not activated yet
Error: gst-resource-error-quark: Failed to create audio transform (1), gstnvinferaudio.cpp(284): gst_nvinferaudio_start (): /GstPipeline:audio_pipeline/GstNvInferAudio:audio_infer

(python3:4189): GStreamer-CRITICAL **: 05:56:59.048: gst_object_unref: assertion ‘object != NULL’ failed

So you are saying that, based on the input audio data, the model will suppress the background or unwanted noise?

I want to implement a way to visualize the input and output audio (spectrograms) to show the effect of noise cancellation

No. I just show you the model we used. It is a classifier model only.

We don’t have such solution. You can get the PCM data out and draw by yourself. DeepStream is an inferencing framework.

Thats right,
How do I get the PCM data?

I am getting this error

The PCM data can be available from the GstBuffer. gstreamer.freedesktop.org

Please make sure you are familiar with GStreamer before you start with DeepStream.