Query in back to back detection

  • Hardware: Jetson AGX Orin
  • Deesptream7.0
  • Jetpack 6.0 (6.0+b106 and 6.0+b87 both are installed) L4T 36.3.0
  • TensorRT 8.6.2
  • NVRM version: NVIDIA UNIX Open Kernel Module for aarch64 540.3.0

Hello actually i am trying to build a pipeline like the following one:

person detection → face detection → face recognition → face swap

now in this my first two custom models are detectors and both of the work in full frame mode.

so i cannot use deepstream-app directly as for it the secondary works on the output of primary which will not work for me as my both custom detectors infer on full frame

and so i used the back-to-back detector example given and it worked for me but it was not showing kps

then i modified the deepstream-app to use only face detector and show the kps and that worked for me

now i want to know that if i want to use both the detectors in full frame mode and also want to show kps then should i modify deepstream-app or should modify the back-to-back detector example, and also please tell me where should i make changes.

Which model will output key points in this case? How did you output the key points now?

Both are OK.

Depends on your models. Please tell us your models’ relationship in detail.

my scrfd based face detection model which works on full frame has total 9 output layers in three groups. each group has following three layers confidence score, bboxes and 5 kps. each group works at different resolution like 12,800 3200 and 800.

Currently i am able to show the kps using following two ways:

  1. modifying deepstream-app along with using custom parser and using my detection model as instance segmentation model and i referred the following ticket to modify the “gie_processing_done_buf_prob” function in deepstream_app.c Object detection pre-trained model inference issue in deepstream - #50 by aniketrohara
  2. Using the back-to-back detectors sample app and using the secondary model in full frame model.

But in 1st method i am not able to use my both of the detection models i.e person detection and face detection both of them which use full frame as input image.

And in 2nd method i don’t know how to add the further models that i have shown in above pipeline.

Okay but still which will be more easy and appropriate?

Totally how many models do you want to add to the pipeline? You have told us that you have two models which are the person detection+keypoints model and face detection+keypoints model. Any more models will be added?

Depends on your requirements. Which functions do you need?

Both applcations are totally open source. You can customize according to your requirements.

i will have following pipeline:
person detection → face detection → face recognition → face swap

and following are outputs of the models that will be used:
person detection → yolov8 → bboxes and confidence score
face detection → scrfd → bboxes, confidence score and 5 kps
face recognition → 512 dimensional embeddings
face swap → 2 input layers (1st target layer with image input and 2nd source layer that takes embeddings from recognition model) → output is image with swapped faces.

Okay so can you guide me for both the cases like what modification will be needed and where and also if there is any reference example with such models in pipeline as said above

Which models are PGIEs? Which models are SGIEs? The SGIE must be based on some PGIE, so which PGIE are the SGIE work with?

the person detection and face detection models are PGIEs and the face recognition and face swap models are SGIEs where both SGIEs operate on output objects of face detection only.

@aniketrohara Do you need to connect the face to the person(which person the face belongs to)?

assume in this image my face detection model detected all the faces and recognition gave me all the embeddings now let’s say i want to swap all the faces with face of “ROSS”(3rd from left), then i will give it’s embeddings to the non-image input layer of my swap model and all other faces to the image input layer of model

This is what i want currently no need to connect the person and face

all my models are independent of person detection, they depend on face detection, i want to do person detection just for visualization purpose of my application.

Your pipeline is OK.

The deepstream-app and back-to-back sample app are all open source. You can modify any of them to adapt to your pipeline.

okay so can you provide me any reference if i want to add more models and their custom parsing in the back to back detector sample other than the existing 2 detectors?

There are many DeepStream samples in deepstream SDK. For example, there are three models in deepstream-test2. There are deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub which involve several models in the pipeline too.

Hello, let me clarify my query properly…i want to know where should i perform modifications in source code of deepstream-app and back-to-back sample app so that:

  1. if i use deepstream-app then i can use 2 detection models consecutively (both operating on full input frame)
  2. if i use back-to-back sample app then i can obtain kps and show them in output video which i am currently unable to do.

For back-to-back sample, you can just add the PGIEs and SGIEs into the pipeline directly in deepstream_reference_apps/back-to-back-detectors/back_to_back_detectors.c at master · NVIDIA-AI-IOT/deepstream_reference_apps (github.com) main(). It is just an ordinary GStreamer pipeline with GStreamer APIs GStreamer: open source multimedia framework. The only thing you need to do is to configure the gst-nvinfer configuration files correctly for your models.

For deepstream-app, it will be more complicated. The create_primary_gie_bin() in /opt/nvidia/deepstream/deepstream/sources/apps/apps-common/src/deepstream_primary_gie_bin.c is encapsulated to create the PGIE bin with configurations. The create_primary_gie_bin() is called in create_common_elements() in /opt/nvidia/deepstream/deepstream/sources/sample_apps/deepstream-app/deepstream_app.c. You need to modify the configuration parsing logic in /opt/nvidia/deepstream/deepstream/sources/apps/apps-common/src/deepstream_config_file_parser.c and the create_common_elements() to read the multiple PGIE configurations and call create_primary_gie_bin() to create more PGIEs. And the element link logic should be modified accordingly too.

Actually i am able to run back-to-back sample app with my both models as detectors on Full Frame. But kps returned by model are not visible in output video. So what should i modify in the back-to-back sample app to make KPS visible.

Take the DeepStream faciallandmarks sample deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) as the example, we put the keypoints in the customized object metadata after the faciallandmark model ( deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)), and then draw the keypoints with the display metadata( deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)).

hey i actually added the logic to attach kps to osd in the static GstPadProbeReturn nvvidconv_sink_pad_buffer_probe function and modified the config file of secondary to instance segmentation instead of a detector network and it worked for me and now i am able to get kps also, thanks for the help!!!