Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU)
GPU • DeepStream Version
7.1 • TensorRT Version
10.3 (in DS7.1 Docker container) • NVIDIA GPU Driver Version (valid for GPU only)
565.57.01 • Issue Type( questions, new requirements, bugs)
Question
After muxing, is it possible to convert a frame from an audio/x-raw(memory:NVMM) buffer back to a plain audio/x-raw buffer?
I’m trying to use some GStreamer plugins that don’t make use of NVMM, which is likely causing the failures I’m seeing when trying to connect them when downstream from the muxer. A capfilter isn’t going to work (I’ve tried).
I realize this might not be possible but I wanted to check.
If this isn’t possible, how have other people tackled the issue of wanting to use plain GStreamer plugins as part of a batched/muxed pipeline? Tee’ing the buffers upstream (pre-muxing) doesn’t seem like a great idea as I can’t see a way to easily correlate any of that data with the data I’d get from other processing done downstream from the muxer.
The “audio/x-raw(memory:NVMM)” buffer is the batched audio frames buffer, it can only be handled by nvinferaudio.
Do you want to use othere GStreamer plugin to process the audio as the preprocessing for inferencing? Or you just do some processing which has nothing to do with inferencing? The method will be different for different purposes.
The processing is meant to be done in parallel with inferencing, e.g. we want to run our own ruleset based on the audio on the same frame as the inferencing, and be able to correlate the two, so that, later on, we can take the results of each “path” and compare them, knowing that each part corresponds to the same frame.
You may use tee after the audio decoder or audio source, one branch for inferencing, the other branch run your own ruleset. The audio frames can be aligned by the timestamp.
For the branch with our own ruleset, how well will that scale without a muxer (i.e. the ability to batch multiple sources)?
Will the PTS in each branch properly line up? Because I’ll be examining an NvDsAudioFrameMeta struct in the inferencing branch, and probably a native GStreamer struct.
Will the timestamps in each branch be perfectly in sync? Or is it possible they could be out by, say, a couple nanoseconds?
You need to align the PTS by yourself. The timestamp is attached on the GstBuffer, it will not be impacted whether the native GST memory or a NV memory is attached.
The app can use the timestamp to sync the operation to the audio data in GstBuffer, the timestamp is reliable. audio buffer
By “scale” I mean adding multiple (dozens, hundreds, etc.) of concurrent audio streams. Would that mean that I have to attach a tee to each audio source, with one branch going to an instance of the custom ruleset and the other branch being directed to the muxer?
And thank you for your response concerning the PTS.
One last question: Are there any plans to develop an “audio convert” version of Gst-nvvideoconvert? In other words, being able to convert from some NVMM-based audio (batched) buffer to a RAW (batched) audio buffer. I’m just curious.