Custom Audio Metadata Not Available Downstream of Muxer Despite Example

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
7.1
• TensorRT Version
10.3 (in DS7.1 Docker container)
• NVIDIA GPU Driver Version (valid for GPU only)
565.57.01
• Issue Type( questions, new requirements, bugs)
Question

I’m attempting to add custom metadata upstream of a stream muxer, using the deepstream-gst-metadata-test sample app as an example. The calls to gst_buffer_add_nvds_meta appear to succeed and I’m able to assign the necessary values to the various attributes. I’m doing this using the incoming GstBuffer in the transform_ip function and not in a probe.

I’m having problems getting this to work in two cases:

  1. If I place a “metadata reader” downstream of the muxer and also downstream of an NvInferAudio plugin (i.e. an audio classifier), it appears as though the first frame in the batch metadata has an invalid address for the user meta list. If I try to access it, I get a segfault. Trying to debug in gdb, I can see gdb can’t access the memory.

  2. If I remove the NvInferAudio plugin, the “metadata reader” no longer segfaults; however, the frame user meta list is now NULL.

The above leads me to believe that perhaps NvInferAudio does something to the metadata, but even more confusing is why the custom metadata isn’t appearing downstream of the muxer in any case.

Nothing else seems to be failing, and I’ve checked and rechecked my code against the gst metadata example.

The metadata reader is pulling from a valid GstBuffer that should be coming from a connected sink pad in the chain. (I’m actually trying to do it using the gst-nvdsaudiotemplate element in the DeepStream GST plugins library, using the buffer from the submitted input buffer, i.e. not from a probe.)

I’m a bit stumped, so if anyone can help, it’d be appreciated.

I can also upload images of the pipelines I’m working with if that helps.

Thanks in advance.

Can you provide the complete pipeline and configurations?

I’ve attached a PNG of the pipeline with the audio classifier included. The GstLevelfilter is where the code exists to attach the custom metadata (as per my original message) and the GstNvDsAudioTemplate filter is where the code exists to try to extract the custom metadata.

The config used is below:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[source0]
enable=1
#Type - 2=URI
type=6
uri=../../../../../samples/streams/sonyc_mixed_audio.wav
num-sources=1
gpu-id=0

[source1]
enable=0
#Type - 2=URI
type=6
uri=../../../../../samples/streams/sonyc_mixed_audio.wav
num-sources=1
gpu-id=0

[streammux]
batch-size=1

[sink0]
enable=1
#Type - 1=FakeSink
type=1
sync=1
source-id=0
gpu-id=0
nvbuf-memory-type=0

[audio-classifier]
enable=1
gpu-id=0
model-engine-file=../../../../../samples/models/SONYC_Audio_Classifier/sonyc_audio_classify.onnx_b2_gpu0_fp32.engine
#property
batch-size=1
nvbuf-memory-type=0
audio-transform=melsdb,fft_length=2560,hop_size=692,dsp_window=hann,num_mels=128,sample_rate=44100,p2db_ref=(float)1.0,p2db_min_power=(float)0.0,p2db_top_db=(float)80.0
# Specify the desired input audio rate to nvinferaudio
# input source(s) shall be audio resampled to this rate
# Here, using 44.1kHz
audio-input-rate=44100
audio-framesize=441000
audio-hopsize=110250
config-file=config_infer_audio_sonyc.txt

[tests]
file-loop=1

Thank you.

Are you trying to transfer NvDsMeta through the audio pipeline as deepstream-gst-metadata-test? Since audio pipeline can only use new nvstreammux, the new nvstreammux does not support NvDsMeta transferring or any other metadata transferring but only generate the new batch meta.

What kind of information do you want to transfer through new nvstreammux?

Yes, I am, and I didn’t know that the metadata transfer facility wasn’t there.

I’ve been following here: Gst-nvstreammux New — DeepStream documentation and it makes it sound like I can still pass custom metadata through the muxer.

I also thought that you had to specify via an environment variable that the new muxer is to be used (I’m not setting any such variable in my own project). So I’m a bit confused, and I appreciate you helping me to get this straight.

As for the kind of information I’m looking to pass through, in this case, it’s the current audio level in dB. I might want some other info about the audio stream that I can’t seem to find in any DeepStream plugins.

Will the audio level change frame by frame?

It’s entirely possible, e.g. in a dynamic environment where there are sudden and short-lived noises like slamming doors or brief conversations.

What is the purpose by transferring the “audio level” through the pipeline? Do you want to control the sound card by passing such information to audio render? Can you transfer such information by the application itself?

The purpose is to analyze the audio as it comes through, similar to an IVA. I’m starting with audio levels to do some math on it and make some determinations, and this is meant to (eventually) be done alongside or in conjunction with inferencing.

Currently it is not supported by the new nvstreammux which is used for the auido batch.

That is unfortunate. Are there any plans to support this workflow? If so, when might it be released?

Would a workaround be to include the audio in a video container, e.g. MP4, and use it as a video source instead of an audio source? Is there a way to extract audio buffers downstream from a streammux’d video?