Query in back to back detection

  • Hardware: Jetson AGX Orin
  • Deesptream7.0
  • Jetpack 6.0 (6.0+b106 and 6.0+b87 both are installed) L4T 36.3.0
  • TensorRT 8.6.2
  • NVRM version: NVIDIA UNIX Open Kernel Module for aarch64 540.3.0

Hello actually i am trying to build a pipeline like the following one:

person detection → face detection → face recognition → face swap

now in this my first two custom models are detectors and both of the work in full frame mode.

so i cannot use deepstream-app directly as for it the secondary works on the output of primary which will not work for me as my both custom detectors infer on full frame

and so i used the back-to-back detector example given and it worked for me but it was not showing kps

then i modified the deepstream-app to use only face detector and show the kps and that worked for me

now i want to know that if i want to use both the detectors in full frame mode and also want to show kps then should i modify deepstream-app or should modify the back-to-back detector example, and also please tell me where should i make changes.

Which model will output key points in this case? How did you output the key points now?

Both are OK.

Depends on your models. Please tell us your models’ relationship in detail.

my scrfd based face detection model which works on full frame has total 9 output layers in three groups. each group has following three layers confidence score, bboxes and 5 kps. each group works at different resolution like 12,800 3200 and 800.

Currently i am able to show the kps using following two ways:

  1. modifying deepstream-app along with using custom parser and using my detection model as instance segmentation model and i referred the following ticket to modify the “gie_processing_done_buf_prob” function in deepstream_app.c Object detection pre-trained model inference issue in deepstream - #50 by aniketrohara
  2. Using the back-to-back detectors sample app and using the secondary model in full frame model.

But in 1st method i am not able to use my both of the detection models i.e person detection and face detection both of them which use full frame as input image.

And in 2nd method i don’t know how to add the further models that i have shown in above pipeline.

Okay but still which will be more easy and appropriate?

Totally how many models do you want to add to the pipeline? You have told us that you have two models which are the person detection+keypoints model and face detection+keypoints model. Any more models will be added?

Depends on your requirements. Which functions do you need?

Both applcations are totally open source. You can customize according to your requirements.

i will have following pipeline:
person detection → face detection → face recognition → face swap

and following are outputs of the models that will be used:
person detection → yolov8 → bboxes and confidence score
face detection → scrfd → bboxes, confidence score and 5 kps
face recognition → 512 dimensional embeddings
face swap → 2 input layers (1st target layer with image input and 2nd source layer that takes embeddings from recognition model) → output is image with swapped faces.

Okay so can you guide me for both the cases like what modification will be needed and where and also if there is any reference example with such models in pipeline as said above

Which models are PGIEs? Which models are SGIEs? The SGIE must be based on some PGIE, so which PGIE are the SGIE work with?

the person detection and face detection models are PGIEs and the face recognition and face swap models are SGIEs where both SGIEs operate on output objects of face detection only.

@aniketrohara Do you need to connect the face to the person(which person the face belongs to)?

assume in this image my face detection model detected all the faces and recognition gave me all the embeddings now let’s say i want to swap all the faces with face of “ROSS”(3rd from left), then i will give it’s embeddings to the non-image input layer of my swap model and all other faces to the image input layer of model

This is what i want currently no need to connect the person and face

all my models are independent of person detection, they depend on face detection, i want to do person detection just for visualization purpose of my application.