Issues with Face Recognition

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) - GPU
• DeepStream Version - 7
• JetPack Version (valid for Jetson only) - NA
• TensorRT Version - 8.6.1
• NVIDIA GPU Driver Version (valid for GPU only) - 535.216.01
• Issue Type( questions, new requirements, bugs) - questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I am referencing the topic which is related to this topic: Facing issues with face detection using Deepstream SDK

From the previous topic we were able to get the bounding boxes and the post process returns the tensor and sgie is able to read the tensor.
Out extended requirement is to generate the embedding for the detected face and compare with the authorized face embedding from DB and flag the person is authorized or not.
First i used ResNet50 in sgie and any picture in the video generates the same embedding and end up getting 1 when i do cosine similarity between vectors.
Second i tried using arcface (got weights and architecture files) and tried converting to TensorRT and ONNX but encountering an issue. so i would like to check if there is any suggestion to use better model in sgie for embedding generation for both training the pictures (labelled picture) and then compare it in real time video to recognize the picture.

ResNet50 config file:

[property]
gpu-id=0
gie-unique-id=2
model-engine-file=/home/dstream/Documents/FR_DEMO/models/facenet_resnet_50/facenet_resnet_fp16.engine
batch-size=1
process-mode=2
network-type=1
network-mode=2
operate-on-gie-id=-1
operate-on-class-ids=0
infer-dims=3;160;160
net-scale-factor=0.00392157
offsets=123.675;116.28;103.53
model-color-format=0
# Disable bbox parsing since this is a feature extractor
custom-network-config=0
#parse-bbox=0
network-mode=0
#input-tensor-meta=1
output-tensor-meta=1
uff-input-blob-name=input.1
output-blob-names=1199

@junshengy let me know if you need any further details here.

Please fix the issues in the sgie configuration file.

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
## 0=Detector, 1=Classifier, 2=Semantic Segmentation, 3=Instance Segmentation, 100=Other
network-type=100

This seems to be a problem with the model. Sgie should output a different tensor for each face.

DeepStream does not provide a face recognition model. You can find open source arcface onnx model on hugging face.

If you want to embed the output tensor into user meta, this example can be used as a reference

Yes the above one we incorporated to add the output tensor to user meta. I am facing issues with the onnx model that was converted from pytorch. The pytorch model works as expected and not the onnx.
Here is the model. minchul/cvlface_arcface_ir101_webface4m · Hugging Face

Also we tried the onnx format from the reference of this blog Face recognition: OnnX to TensorRT conversion of Arcface Model | by Kavika Roy | DataToBiz | Medium
and i was not successful.

Getting Nan value as embed value or shape mismatch error. Appreciate if you have any suggestion to get the desired embedding from sgie. I am attaching the reference scripts that I tried here.

pytorch to onnx conversion script

import torch
import torch.onnx

wrapper_folder = "/home/dstream/Documents/FR_DEMO/models/cvlface_arcface_ir101_webface4m"
sys.path.append(wrapper_folder)

# Load the PyTorch model
from wrapper import CVLFaceRecognitionModel, ModelConfig

config = ModelConfig()
model = CVLFaceRecognitionModel(config)
model.eval()

# Define a dummy input with the correct shape and range
dummy_input = torch.randn(1, 3, 112, 112)  # Shape: (batch_size, channels, height, width)

# Export the model to ONNX
onnx_path = "/home/dstream/Documents/FR_DEMO/models/cvlface_arcface_ir101_webface4m/pretrained_model/arcface222_onnx.onnx"
torch.onnx.export(
    model, dummy_input, onnx_path,
    input_names=["input_image"],   # Custom input tensor name
    output_names=["features"],     # Custom output tensor name
    dynamic_axes={"input_image": {0: "batch_size"}, "features": {0: "batch_size"}},  # Allow dynamic batch size
    opset_version=11,
    verbose=True
)

print(f"✅ Model exported to ONNX: {onnx_path}")

Thanks for your support.

This may be a problem when exporting pt to onnx.
This usually requires modifying the model’s operators (dynamic batch / dynamic shape and so on). I am not familiar with this model. You can try to contact the author on github.

Hi @junshengy,

We have resolved the issue and are now able to get embeddings from ArcFace model. I’ve also created a pipeline outside of DeepStream to store the generated embeddings for training (for using while recognition).

I’ve integrated the recognition model (SGIE) into the DeepStream pipeline, but when trying to extract the embeddings from the SGIE metadata, I am getting None from obj_meta.usr_list_meta. Based on my understanding, when the SGIE (recognition) follows the PGIE (detection), and with process-mode=1 and output-tensor-meta=1 set on the SGIE, the generated embeddings should be stored in the user list of the object. However, when I attempt to extract the embeddings from obj_meta.usr_list_meta, I’m encountering the following issue:

The log from my terminal which states my issue:

Entering PGIE filter function
Processing frame 0
Objects detected in frame 0
Bounding Box: {‘top’: 342, ‘left’: 707, ‘width’: 244, ‘height’: 334}
Face Recognition started
Getting the Face Features
l_user_meta is None
Face feature : None
Entering PGIE filter function
Processing frame 0
Objects detected in frame 0
Bounding Box: {‘top’: 343, ‘left’: 711, ‘width’: 241, ‘height’: 324}
Face Recognition started
Getting the Face Features
l_user_meta is None
Face feature : None

Could you please guide me on where to correctly extract the embeddings generated by the SGIE model within the DeepStream pipeline? Additionally, would using the embeddings generated outside of DeepStream for recognition work, or is it essential to extract them from the pipeline itself?

Here’s the SGIE config file I’m using:
sgie_config_webface.txt (797 Bytes)

And here’s the probe I attached to the SGIE for embedding extraction:

def sgie_feature_extract_probe(pad, info , data):

print("Face Recognition started ")
"""
Probe to extract facial feature from user-meta data and perform recognition.

Args:
    pad: GstPad.
    info: GstPadProbeInfo.
    data: Tuple containing (loaded_faces, threshold, output_dir).
"""
loaded_faces = data  # Dictionary of embeddings from the pickle file
threshold = 0.7 # Similarity threshold
#output_dir = data[2]  # Directory to save embeddings (optional)

gst_buffer = info.get_buffer()
if not gst_buffer:
    print("Unable to get GstBuffer")
    return Gst.PadProbeReturn.OK

batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))    
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
    try:
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    except StopIteration:
        break
    
    l_obj = frame_meta.obj_meta_list
    frame_number = frame_meta.frame_num
    while l_obj is not None:
        try:
            obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
        except StopIteration:
            break

        # Extract the face embedding
        if obj_meta is None:
            print("object meta is None")
            
        face_feature = get_face_feature(obj_meta)
        print("Face feature : ", face_feature)
        if face_feature is not None:
            # Perform similarity scoring
            best_match = None
            best_score = -1
            for key, value in loaded_faces.items():
                score = np.dot(face_feature, value.T)[0][0]  # Cosine similarity
                if score > best_score:
                    best_score = score
                    best_match = key

            # Display the result if the score exceeds the threshold
            if best_score > threshold:
                display_meta = pyds.nvds_acquire_display_meta_from_pool(batch_meta)
                display_meta.num_labels = 1
                py_nvosd_text_params = display_meta.text_params[0]

                # Set the display text (name of the matched face)
                py_nvosd_text_params.display_text = best_match

                # Set the position of the text
                py_nvosd_text_params.x_offset = int(obj_meta.rect_params.left)
                py_nvosd_text_params.y_offset = int(obj_meta.rect_params.top + obj_meta.rect_params.height)

                # Set font properties
                py_nvosd_text_params.font_params.font_name = "Serif"
                py_nvosd_text_params.font_params.font_size = 20
                py_nvosd_text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)  # White color

                # Set text background color
                py_nvosd_text_params.set_bg_clr = 1
                py_nvosd_text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0)  # Black background

                # Add the display meta to the frame
                pyds.nvds_add_display_meta_to_frame(frame_meta, display_meta)

        try: 
            l_obj = l_obj.next
        except StopIteration:
            break  

    try:
        l_frame = l_frame.next
    except StopIteration:
        break

return Gst.PadProbeReturn.OK

def get_face_feature(obj_meta):
print(“Getting the Face Features”)
“”"Get face feature from user-meta data.

Args:
    obj_meta (NvDsObjectMeta): Object metadata.
Returns:
    np.array: Normalized face feature.
"""
l_user_meta = obj_meta.obj_user_meta_list
if l_user_meta is None:
    print("l_user_meta is None")
#print ( f"l_user_meta: {l_user_meta}")
while l_user_meta:
    try:
        user_meta = pyds.NvDsUserMeta.cast(l_user_meta.data) 
    except StopIteration:
        break
    if user_meta and user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META: 
        try:
            tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
        except StopIteration:
            break

        layer = pyds.get_nvds_LayerInfo(tensor_meta, 0)
        output = []
        for i in range(512):  # Assuming the embedding size is 512
            output.append(pyds.get_detections(layer.buffer, i))
        print("output : ",output)
        res = np.reshape(output, (1, -1))
        print("result")
        norm = np.linalg.norm(res)                    
        normal_array = res / norm  # Normalize the embedding
        print("Normal array: ", normal_array)
        return normal_array

    try:
        l_user_meta = l_user_meta.next
    except StopIteration:
        break

return None

process-mode should be 2. If it is 1, it is processed for the entire frame instead of the face, the output tensor will only be found in frame_meta.usr_list_meta

Do you mean using only deepstream to detect the bbox of the face, and then using another program for recognition?

If so, you can access frame_meta in pgie’s src pad probe function and extract the bbox, then send it to your target program via ipc/socket

No I didn’t mean that, for recognition, we need to compare the generated embeddings with the stored embeddings. To obtain these stored embeddings, we need to create another DeepStream pipeline to store the generated embeddings of the images in the dataset, or can we also use the embeddings generated by the same .engine model outside of DeepStream?

This does not guarantee that all platforms generate the same embeddings, they are float tensors

In the example mentioned above, you can consider sending the generated embedding to your server/application over nvmsgconv+nvmsgbroker

Hi @junshengy,

As per your suggestion, I changed process-mode = 1 to process-mode = 2 and attempted to extract embeddings from object_meta. However, I encountered unexpected results.

Issue Details:

  1. No Valid Embeddings from object_meta
    After implementing the change, I received the output: “No valid embeddings generated.”

Script Used for Extracting Embeddings:

def extract_embeddings(obj_meta):
“”"
Extract embeddings from object-level metadata only.
Returns normalized embedding or None.
“”"
normalized_embedding = None

# Process object-level metadata
l_user_meta = obj_meta.obj_user_meta_list
if l_user_meta is None:
    print("Luser meta is None")
while l_user_meta:
    try:
        user_meta = pyds.NvDsUserMeta.cast(l_user_meta.data)
        if user_meta and user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META:
            tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
            layer = pyds.get_nvds_LayerInfo(tensor_meta, 0)
            output = [pyds.get_detections(layer.buffer, i) for i in range(512)]
            res = np.reshape(output, (1, -1))
            
            norm = np.linalg.norm(res)
            if norm != 0:
                normalized_embedding = res / norm
                return normalized_embedding
    except Exception as e:
        print(f"Error in object metadata: {e}")
    l_user_meta = l_user_meta.next

return None

def sgie_feature_extract_probe(pad, info, u_data):
“”"
Probe to extract features and save embeddings.
This function is used in the training pipeline.
“”"
print(“Entering into sgie training probe”)

unique_id = truncate_filename(u_data)
print(f"Unique ID for saving: {unique_id}")

gst_buffer = info.get_buffer()
if not gst_buffer:
    print("Unable to get GstBuffer")
    return Gst.PadProbeReturn.OK

batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
if not batch_meta:
    print("No batch_meta found.")
    return Gst.PadProbeReturn.OK

l_frame = batch_meta.frame_meta_list
embedding_saved = False  # Flag to track if we've saved an embedding

while l_frame and not embedding_saved:  # Process frames until we find and save an embedding
    try:
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        print(f"Processing frame {frame_meta.frame_num}")
        
        # Try to get embeddings from objects
        l_obj = frame_meta.obj_meta_list
        while l_obj and not embedding_saved:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
                if obj_meta is None :
                    print("object meta is None")
                print(f"Processing object {obj_meta.object_id}, class {obj_meta.class_id}")
                
                # Extract embeddings from object-level metadata
                normalized_embedding = extract_embeddings(obj_meta)
                
                if normalized_embedding is not None:
                    print(f"Successfully extracted normalized embedding from object-level metadata")
                    print(f"Shape: {normalized_embedding.shape}")
                    
                    # Save the embedding
                    filepath = '/home/dstream/Documents/FR_DEMO/Training_pipeline/embeddings.pkl'
                    save_embeddings(normalized_embedding, unique_id, filepath)
                    embedding_saved = True
                    break  # Exit the object loop after saving
            
            except StopIteration:
                break
                
            l_obj = l_obj.next
        
    except StopIteration:
        break
        
    l_frame = l_frame.next

if not embedding_saved:
    print("No valid embeddings found in any frame for saving")

return Gst.PadProbeReturn.OK

I also attempted to extract embeddings from frame_meta as a trial. However, in this case, embeddings were generated for the entire frame rather than just the detected objects. This led to embeddings being created even when no object was detected, resulting in inconsistent recognition results.

Additionally, I observed inconsistencies in person recognition across different videos:

  • In one video, the person is correctly recognized.
  • In another video, the same person is not recognized, leading to inconsistent identification.

How can I troubleshoot this issue? Is the problem coming from the script I am using to extract the embeddings?

I also tried extracting metadata from both the frame meta user list and the object meta user list. In the recognition pipeline, the tensor comes from the object user meta list, whereas in the training pipeline, the output tensor comes from the frame user meta list.

1.Each frame may contain multiple faces, In general this model should input face object instead of frame.
Of course, I don’t know your sgie model, so I need you to confirm whether to enter frame or object.

2.This python code is just to extract NVDSINFER_TENSOR_OUTPUT_META,I think there is no problem
3.If pgie is working properly, the problem may be in the configuration or model of sgie. If you can input the entire frame, then you don’t need pgie to detect faces.

What is training pipeline? Deepstream is not used for training.

@junshengy ,

My processing video has multiple faces , so I need to concentrate on the detected objects not frames .

Let me share my config files and app files as well :

Pgie config file :

cofig_infer_yolov8_forum.txt (944 Bytes)

Sgie config file:

sgie_config_webface.txt (798 Bytes)

I am using this Sgie model :

w600k_mbf.zip (12.0 MB)

app file : (in .txt format)

app.txt (5.7 KB)

#infer-dims=3;160;160
net-scale-factor=0.0078125
offsets=127.5;127.5;127.5

input-tensor-meta=1

What is this? Does performing normalization and mean subtraction satisfy your sgie input? I don’t know much about this model. Please debug it by yourself.