• Hardware Platform (Jetson / GPU)
Jetson Orin 4012, NVIDIA Jetson Orin NX Bundle, 8x 2GHz, 16GB DDR5
• DeepStream Version
Container: deepstream:7.0-triton-multiarch
• JetPack Version (valid for Jetson only)
see Container: deepstream:7.0-triton-multiarch
• TensorRT Version
see Container: deepstream:7.0-triton-multiarch
• NVIDIA GPU Driver Version (valid for GPU only)
Container: deepstream:7.0-triton-multiarch
• Issue Type( questions, new requirements, bugs)
Question
We plan to answer the following question for an input video + audio feed using a gstreamer pipeline which uses the nvinfer plugin:
Given a video with an arbitrary number of people in it, is one of the persons speaking and if so, which one?
Currently we use a heuristical approach on the output we get from the following pipeline using the NVIDIA FacialLandmarks net:
gst-launch-1.0 v4l2src device=/dev/video0 !
gst-launch-1.0 v4l2src device=/dev/video0 !
nvvideoconvert src-crop=0:0:1920:1080 !
m.sink_0 nvstreammux name=m batch-size=1 live-source=1 width=1280 height=920 !
nvinfer config-file-path=ai_pipeline/configs/facedetect.yml !\n
nvinfer config-file-path=ai_pipeline/configs/landmarks.yml !\n
fakesink
But we are not satisfied with our results and wondered, if there already is a solution for this problem somewhere in the NVIDIA Model Zoo?