How to merge ouput classes of deepstream-audio sample app with output OSD of deepstream-app (object detection)?

I run deepstream-audio sample app (sound classification) successfully, it works fine and prints ouput classes/lables to Terminal. Both deepstream-audio and deepstream-app read the same input RTSP, but deepstream-audio outputs classes to Terminal, deepstream-app outputs image and box to OSD.
I want to merge results of two above apps into only one OSD of deepstream-app. How to do that?
Thank you very much.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 6.2
• JetPack Version (valid for Jetson only) 5.1.1
• TensorRT Version 8.6
• Issue Type (questions, new requirements, bugs) questions

Can you provide a detailed description of the OSD, like what is the background, where do you want to draw the classes/lables?

Thank for your reply,
I just want to draw theclasses/lables classified by deepstream-audio at the top right corner (yellow rectangle), background is still output OSD of deepstream-app.
It is the same below image.

OK. You can refer to our deepstream_test1_app.c to implement this function. This demo drew the Person = and Vehicle = on the video.


Since deepstream-app is open source. You can just get the classes/lables and draw that on the video.

1 Like

Thank you for your support, I can see osd_sink_pad_buffer_probe function to draw a text on the video, it’s OK. But deepstream-audio (in \opt\nvidia\deepstream\deepstream-6.2\sources\apps\sample_apps\deepstream-audio\ folder) and deepstream-app are separate two pipelines. The output lables/classes of deepstream-audio is shown on Terminal now. Is there any way to push the lables/classes of deepstream-audio to osd_sink_pad_buffer_probe funtion of deepstream-app? The last purpose is to run deepstream-app and deepstream-audio having the same input rtsp source and to output results in the one OSD.

So your scenario is the media that contains both video and audio. During the video playback, the inference results of the audio can be displayed on the video. Is that right?

1 Like

Yes, that’s right. I have an idea that is implementation of both audio pipeline (using nvinferaudio) and video pipeline (using nvinfer) in a one source code then using multithreading in C/C++ (one thread for audio pipeline, and one thread for video pipeline) after that I attach the inference results of the audio pipeline to video using osd_sink_pad_buffer_probe like above. Is it OK? Do you have any recommandation? Thank you.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Yes. The pipeline is like below:

             -> audio_dec->nvstreammux->......

Because it is processed in different threads, you need to consider the synchronization of audio and video and the label transmission issues.

You can also refer to our open source demo for x86: opt\nvidia\deepstream\deepstream\sources\apps\sample_apps\deepstream-avsync.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.