Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) : RTX A6000 • DeepStream Version : 8.0 • JetPack Version (valid for Jetson only) not jetson, ubuntu 24 dGPU • TensorRT Version : 10.9.0.34 • NVIDIA GPU Driver Version (valid for GPU only) 570.195.03
I wanted to create an embedding based face recognition pipeline using Insightface buffalo_l models.
In raw python the 5 models within the buffalo_l handles the face detection, landmark identification, embedding generation and embedding match(recognition).
But for transforming this entire pipeline to deepstrem, i may have to convert all of these 5 models to engines individually as buffalo_l cannot be directly used in deepstream.
the five models are
1k3d68.onnx
2d106det.onnx
det_10g.onnx
genderage.onnx
w600k_r50.onnx
if i am to convert each of them, then I would have to make the primary detector model that is det_10g as pgie which will also require a custom parser to begin with and remaining each as sgie s. but this feels hectic. Is there a proper methodology to implement the pipeline?
Thank you for your response. Here’s a breakdown of the models and their relationships in the InsightFace buffalo_l pipeline:
Pipeline Flow:
det_10g.onnx (Face Detection)
Input: RGB image (640x640)
Output: Bounding boxes and confidence scores for detected faces
Role: Primary detector - locates face regions in the input frame
1k3d68.onnx (3D Landmark Detection) - But I dont need this as I am using 2D workflow.
2d106det.onnx (2D Landmark Detection)
Input: Cropped face image from det_10g output (192x192)
Output: 106 2D facial landmark points
Dependency: Requires bounding boxes from det_10g
Purpose: Used for face alignment before embedding generation
genderage.onnx (Gender & Age Estimation) - This one is also not used as of now
Input: Aligned/cropped face (96x96)
Output: Gender classification and age estimation
Dependency: Works on aligned faces from landmark detection
w600k_r50.onnx (Face Recognition/Embedding)
Input: Aligned face image (112x112) - alignment done using landmarks from 2d106det
Output: 512-dimensional embedding vector
Dependency: Requires face alignment using landmarks from 2d106det
Key Relationships:
det_10g must run first as PGIE
2d106det needs det_10g’s bounding boxes for face alignment
w600k_r50 needs aligned faces (using 2d106det landmarks) for embedding generation
The embedding vector from w600k_r50 is then compared against a database for recognition
My main concern is handling the face alignment preprocessing between 2d106det and w600k_r50, and whether I need custom nvdspreprocess or probe functions for this transformation in the DeepStream pipeline.
Original face bounding box (from det_10g face detector)
106 2D facial landmark points (from 2d106det - 2d landmark model) - specifically the 5 key points: left eye, right eye, nose tip, left mouth corner, right mouth corner
Cropped face image region
Processing: The alignment performs an affine transformation to normalize the face to a canonical pose:
Calculates the transformation matrix based on the detected landmarks and target landmark positions (standard face template)
Applies similarity transform (rotation, scale, translation) to align eyes horizontally and center the face
Warps the face image to the target size (112x112 for w600k_r50)
Output:
Aligned and normalized face image (112x112 RGB) suitable for the embedding model (w600k_r50)
Implementation Note: In the original InsightFace implementation, this is handled by their face_align.norm_crop() function which uses cv2.warpAffine. In DeepStream, I’m unsure whether:
This transformation should be handled in a custom nvdspreprocess configuration
I need to write a custom probe function to perform the affine transformation between the 2d106det and w600k_r50 models
Or if there’s a recommended DeepStream-native approach for this.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.