Inisightface custom model pipelining in Deepstream for face recognition

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) : RTX A6000
• DeepStream Version : 8.0
• JetPack Version (valid for Jetson only) not jetson, ubuntu 24 dGPU
• TensorRT Version : 10.9.0.34
• NVIDIA GPU Driver Version (valid for GPU only) 570.195.03

I wanted to create an embedding based face recognition pipeline using Insightface buffalo_l models.
In raw python the 5 models within the buffalo_l handles the face detection, landmark identification, embedding generation and embedding match(recognition).
But for transforming this entire pipeline to deepstrem, i may have to convert all of these 5 models to engines individually as buffalo_l cannot be directly used in deepstream.

the five models are

  1. 1k3d68.onnx

  2. 2d106det.onnx

  3. det_10g.onnx

  4. genderage.onnx

  5. w600k_r50.onnx

if i am to convert each of them, then I would have to make the primary detector model that is det_10g as pgie which will also require a custom parser to begin with and remaining each as sgie s. but this feels hectic. Is there a proper methodology to implement the pipeline?

The model pipeline depends on the models features and the relationship between the models but not on DeepStream.

Can you describe the inputs and outputs of each model and the relationship between the models?

Thank you for your response. Here’s a breakdown of the models and their relationships in the InsightFace buffalo_l pipeline:

Pipeline Flow:

  1. det_10g.onnx (Face Detection)

    • Input: RGB image (640x640)

    • Output: Bounding boxes and confidence scores for detected faces

    • Role: Primary detector - locates face regions in the input frame

  2. 1k3d68.onnx (3D Landmark Detection) - But I dont need this as I am using 2D workflow.

  3. 2d106det.onnx (2D Landmark Detection)

    • Input: Cropped face image from det_10g output (192x192)

    • Output: 106 2D facial landmark points

    • Dependency: Requires bounding boxes from det_10g

    • Purpose: Used for face alignment before embedding generation

  4. genderage.onnx (Gender & Age Estimation) - This one is also not used as of now

    • Input: Aligned/cropped face (96x96)

    • Output: Gender classification and age estimation

    • Dependency: Works on aligned faces from landmark detection

  5. w600k_r50.onnx (Face Recognition/Embedding)

    • Input: Aligned face image (112x112) - alignment done using landmarks from 2d106det

    • Output: 512-dimensional embedding vector

    • Dependency: Requires face alignment using landmarks from 2d106det

Key Relationships:

  • det_10g must run first as PGIE

  • 2d106det needs det_10g’s bounding boxes for face alignment

  • w600k_r50 needs aligned faces (using 2d106det landmarks) for embedding generation

  • The embedding vector from w600k_r50 is then compared against a database for recognition

My main concern is handling the face alignment preprocessing between 2d106det and w600k_r50, and whether I need custom nvdspreprocess or probe functions for this transformation in the DeepStream pipeline.

What is the alignment processing algorithm’s input and output?

The face alignment algorithm takes the following:

Input:

  1. Original face bounding box (from det_10g face detector)

  2. 106 2D facial landmark points (from 2d106det - 2d landmark model) - specifically the 5 key points: left eye, right eye, nose tip, left mouth corner, right mouth corner

  3. Cropped face image region

Processing: The alignment performs an affine transformation to normalize the face to a canonical pose:

  1. Calculates the transformation matrix based on the detected landmarks and target landmark positions (standard face template)

  2. Applies similarity transform (rotation, scale, translation) to align eyes horizontally and center the face

  3. Warps the face image to the target size (112x112 for w600k_r50)

Output:

  • Aligned and normalized face image (112x112 RGB) suitable for the embedding model (w600k_r50)

Implementation Note: In the original InsightFace implementation, this is handled by their face_align.norm_crop() function which uses cv2.warpAffine. In DeepStream, I’m unsure whether:

  1. This transformation should be handled in a custom nvdspreprocess configuration

  2. I need to write a custom probe function to perform the affine transformation between the 2d106det and w600k_r50 models

  3. Or if there’s a recommended DeepStream-native approach for this.

For the affine preprocessing, please refer to deepstream_tao_apps/apps/tao_others/deepstream_custom_preprocessing_app at master · NVIDIA-AI-IOT/deepstream_tao_apps to check whether it can meet your requirement

For the pipeline, the relationship between the models are clear, you can also refer to the sample deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps, this sample demonstrates how to transfer bodypose points between SGIEs.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.