Issues with Face Recognition

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) - GPU
• DeepStream Version - 7
• JetPack Version (valid for Jetson only) - NA
• TensorRT Version - 8.6.1
• NVIDIA GPU Driver Version (valid for GPU only) - 535.216.01
• Issue Type( questions, new requirements, bugs) - questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I am referencing the topic which is related to this topic: Facing issues with face detection using Deepstream SDK

From the previous topic we were able to get the bounding boxes and the post process returns the tensor and sgie is able to read the tensor.
Out extended requirement is to generate the embedding for the detected face and compare with the authorized face embedding from DB and flag the person is authorized or not.
First i used ResNet50 in sgie and any picture in the video generates the same embedding and end up getting 1 when i do cosine similarity between vectors.
Second i tried using arcface (got weights and architecture files) and tried converting to TensorRT and ONNX but encountering an issue. so i would like to check if there is any suggestion to use better model in sgie for embedding generation for both training the pictures (labelled picture) and then compare it in real time video to recognize the picture.

ResNet50 config file:

[property]
gpu-id=0
gie-unique-id=2
model-engine-file=/home/dstream/Documents/FR_DEMO/models/facenet_resnet_50/facenet_resnet_fp16.engine
batch-size=1
process-mode=2
network-type=1
network-mode=2
operate-on-gie-id=-1
operate-on-class-ids=0
infer-dims=3;160;160
net-scale-factor=0.00392157
offsets=123.675;116.28;103.53
model-color-format=0
# Disable bbox parsing since this is a feature extractor
custom-network-config=0
#parse-bbox=0
network-mode=0
#input-tensor-meta=1
output-tensor-meta=1
uff-input-blob-name=input.1
output-blob-names=1199

@junshengy let me know if you need any further details here.

Please fix the issues in the sgie configuration file.

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
## 0=Detector, 1=Classifier, 2=Semantic Segmentation, 3=Instance Segmentation, 100=Other
network-type=100

This seems to be a problem with the model. Sgie should output a different tensor for each face.

DeepStream does not provide a face recognition model. You can find open source arcface onnx model on hugging face.

If you want to embed the output tensor into user meta, this example can be used as a reference

Yes the above one we incorporated to add the output tensor to user meta. I am facing issues with the onnx model that was converted from pytorch. The pytorch model works as expected and not the onnx.
Here is the model. minchul/cvlface_arcface_ir101_webface4m · Hugging Face

Also we tried the onnx format from the reference of this blog Face recognition: OnnX to TensorRT conversion of Arcface Model | by Kavika Roy | DataToBiz | Medium
and i was not successful.

Getting Nan value as embed value or shape mismatch error. Appreciate if you have any suggestion to get the desired embedding from sgie. I am attaching the reference scripts that I tried here.

pytorch to onnx conversion script

import torch
import torch.onnx

wrapper_folder = "/home/dstream/Documents/FR_DEMO/models/cvlface_arcface_ir101_webface4m"
sys.path.append(wrapper_folder)

# Load the PyTorch model
from wrapper import CVLFaceRecognitionModel, ModelConfig

config = ModelConfig()
model = CVLFaceRecognitionModel(config)
model.eval()

# Define a dummy input with the correct shape and range
dummy_input = torch.randn(1, 3, 112, 112)  # Shape: (batch_size, channels, height, width)

# Export the model to ONNX
onnx_path = "/home/dstream/Documents/FR_DEMO/models/cvlface_arcface_ir101_webface4m/pretrained_model/arcface222_onnx.onnx"
torch.onnx.export(
    model, dummy_input, onnx_path,
    input_names=["input_image"],   # Custom input tensor name
    output_names=["features"],     # Custom output tensor name
    dynamic_axes={"input_image": {0: "batch_size"}, "features": {0: "batch_size"}},  # Allow dynamic batch size
    opset_version=11,
    verbose=True
)

print(f"✅ Model exported to ONNX: {onnx_path}")

Thanks for your support.

This may be a problem when exporting pt to onnx.
This usually requires modifying the model’s operators (dynamic batch / dynamic shape and so on). I am not familiar with this model. You can try to contact the author on github.

Hi @junshengy,

We have resolved the issue and are now able to get embeddings from ArcFace model. I’ve also created a pipeline outside of DeepStream to store the generated embeddings for training (for using while recognition).

I’ve integrated the recognition model (SGIE) into the DeepStream pipeline, but when trying to extract the embeddings from the SGIE metadata, I am getting None from obj_meta.usr_list_meta. Based on my understanding, when the SGIE (recognition) follows the PGIE (detection), and with process-mode=1 and output-tensor-meta=1 set on the SGIE, the generated embeddings should be stored in the user list of the object. However, when I attempt to extract the embeddings from obj_meta.usr_list_meta, I’m encountering the following issue:

The log from my terminal which states my issue:

Entering PGIE filter function
Processing frame 0
Objects detected in frame 0
Bounding Box: {‘top’: 342, ‘left’: 707, ‘width’: 244, ‘height’: 334}
Face Recognition started
Getting the Face Features
l_user_meta is None
Face feature : None
Entering PGIE filter function
Processing frame 0
Objects detected in frame 0
Bounding Box: {‘top’: 343, ‘left’: 711, ‘width’: 241, ‘height’: 324}
Face Recognition started
Getting the Face Features
l_user_meta is None
Face feature : None

Could you please guide me on where to correctly extract the embeddings generated by the SGIE model within the DeepStream pipeline? Additionally, would using the embeddings generated outside of DeepStream for recognition work, or is it essential to extract them from the pipeline itself?

Here’s the SGIE config file I’m using:
sgie_config_webface.txt (797 Bytes)

And here’s the probe I attached to the SGIE for embedding extraction:

def sgie_feature_extract_probe(pad, info , data):

print("Face Recognition started ")
"""
Probe to extract facial feature from user-meta data and perform recognition.

Args:
    pad: GstPad.
    info: GstPadProbeInfo.
    data: Tuple containing (loaded_faces, threshold, output_dir).
"""
loaded_faces = data  # Dictionary of embeddings from the pickle file
threshold = 0.7 # Similarity threshold
#output_dir = data[2]  # Directory to save embeddings (optional)

gst_buffer = info.get_buffer()
if not gst_buffer:
    print("Unable to get GstBuffer")
    return Gst.PadProbeReturn.OK

batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))    
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
    try:
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    except StopIteration:
        break
    
    l_obj = frame_meta.obj_meta_list
    frame_number = frame_meta.frame_num
    while l_obj is not None:
        try:
            obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
        except StopIteration:
            break

        # Extract the face embedding
        if obj_meta is None:
            print("object meta is None")
            
        face_feature = get_face_feature(obj_meta)
        print("Face feature : ", face_feature)
        if face_feature is not None:
            # Perform similarity scoring
            best_match = None
            best_score = -1
            for key, value in loaded_faces.items():
                score = np.dot(face_feature, value.T)[0][0]  # Cosine similarity
                if score > best_score:
                    best_score = score
                    best_match = key

            # Display the result if the score exceeds the threshold
            if best_score > threshold:
                display_meta = pyds.nvds_acquire_display_meta_from_pool(batch_meta)
                display_meta.num_labels = 1
                py_nvosd_text_params = display_meta.text_params[0]

                # Set the display text (name of the matched face)
                py_nvosd_text_params.display_text = best_match

                # Set the position of the text
                py_nvosd_text_params.x_offset = int(obj_meta.rect_params.left)
                py_nvosd_text_params.y_offset = int(obj_meta.rect_params.top + obj_meta.rect_params.height)

                # Set font properties
                py_nvosd_text_params.font_params.font_name = "Serif"
                py_nvosd_text_params.font_params.font_size = 20
                py_nvosd_text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)  # White color

                # Set text background color
                py_nvosd_text_params.set_bg_clr = 1
                py_nvosd_text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0)  # Black background

                # Add the display meta to the frame
                pyds.nvds_add_display_meta_to_frame(frame_meta, display_meta)

        try: 
            l_obj = l_obj.next
        except StopIteration:
            break  

    try:
        l_frame = l_frame.next
    except StopIteration:
        break

return Gst.PadProbeReturn.OK

def get_face_feature(obj_meta):
print(“Getting the Face Features”)
“”"Get face feature from user-meta data.

Args:
    obj_meta (NvDsObjectMeta): Object metadata.
Returns:
    np.array: Normalized face feature.
"""
l_user_meta = obj_meta.obj_user_meta_list
if l_user_meta is None:
    print("l_user_meta is None")
#print ( f"l_user_meta: {l_user_meta}")
while l_user_meta:
    try:
        user_meta = pyds.NvDsUserMeta.cast(l_user_meta.data) 
    except StopIteration:
        break
    if user_meta and user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META: 
        try:
            tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
        except StopIteration:
            break

        layer = pyds.get_nvds_LayerInfo(tensor_meta, 0)
        output = []
        for i in range(512):  # Assuming the embedding size is 512
            output.append(pyds.get_detections(layer.buffer, i))
        print("output : ",output)
        res = np.reshape(output, (1, -1))
        print("result")
        norm = np.linalg.norm(res)                    
        normal_array = res / norm  # Normalize the embedding
        print("Normal array: ", normal_array)
        return normal_array

    try:
        l_user_meta = l_user_meta.next
    except StopIteration:
        break

return None

process-mode should be 2. If it is 1, it is processed for the entire frame instead of the face, the output tensor will only be found in frame_meta.usr_list_meta

Do you mean using only deepstream to detect the bbox of the face, and then using another program for recognition?

If so, you can access frame_meta in pgie’s src pad probe function and extract the bbox, then send it to your target program via ipc/socket

No I didn’t mean that, for recognition, we need to compare the generated embeddings with the stored embeddings. To obtain these stored embeddings, we need to create another DeepStream pipeline to store the generated embeddings of the images in the dataset, or can we also use the embeddings generated by the same .engine model outside of DeepStream?

This does not guarantee that all platforms generate the same embeddings, they are float tensors

In the example mentioned above, you can consider sending the generated embedding to your server/application over nvmsgconv+nvmsgbroker

Hi @junshengy,

As per your suggestion, I changed process-mode = 1 to process-mode = 2 and attempted to extract embeddings from object_meta. However, I encountered unexpected results.

Issue Details:

  1. No Valid Embeddings from object_meta
    After implementing the change, I received the output: “No valid embeddings generated.”

Script Used for Extracting Embeddings:

def extract_embeddings(obj_meta):
“”"
Extract embeddings from object-level metadata only.
Returns normalized embedding or None.
“”"
normalized_embedding = None

# Process object-level metadata
l_user_meta = obj_meta.obj_user_meta_list
if l_user_meta is None:
    print("Luser meta is None")
while l_user_meta:
    try:
        user_meta = pyds.NvDsUserMeta.cast(l_user_meta.data)
        if user_meta and user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META:
            tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
            layer = pyds.get_nvds_LayerInfo(tensor_meta, 0)
            output = [pyds.get_detections(layer.buffer, i) for i in range(512)]
            res = np.reshape(output, (1, -1))
            
            norm = np.linalg.norm(res)
            if norm != 0:
                normalized_embedding = res / norm
                return normalized_embedding
    except Exception as e:
        print(f"Error in object metadata: {e}")
    l_user_meta = l_user_meta.next

return None

def sgie_feature_extract_probe(pad, info, u_data):
“”"
Probe to extract features and save embeddings.
This function is used in the training pipeline.
“”"
print(“Entering into sgie training probe”)

unique_id = truncate_filename(u_data)
print(f"Unique ID for saving: {unique_id}")

gst_buffer = info.get_buffer()
if not gst_buffer:
    print("Unable to get GstBuffer")
    return Gst.PadProbeReturn.OK

batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
if not batch_meta:
    print("No batch_meta found.")
    return Gst.PadProbeReturn.OK

l_frame = batch_meta.frame_meta_list
embedding_saved = False  # Flag to track if we've saved an embedding

while l_frame and not embedding_saved:  # Process frames until we find and save an embedding
    try:
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        print(f"Processing frame {frame_meta.frame_num}")
        
        # Try to get embeddings from objects
        l_obj = frame_meta.obj_meta_list
        while l_obj and not embedding_saved:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
                if obj_meta is None :
                    print("object meta is None")
                print(f"Processing object {obj_meta.object_id}, class {obj_meta.class_id}")
                
                # Extract embeddings from object-level metadata
                normalized_embedding = extract_embeddings(obj_meta)
                
                if normalized_embedding is not None:
                    print(f"Successfully extracted normalized embedding from object-level metadata")
                    print(f"Shape: {normalized_embedding.shape}")
                    
                    # Save the embedding
                    filepath = '/home/dstream/Documents/FR_DEMO/Training_pipeline/embeddings.pkl'
                    save_embeddings(normalized_embedding, unique_id, filepath)
                    embedding_saved = True
                    break  # Exit the object loop after saving
            
            except StopIteration:
                break
                
            l_obj = l_obj.next
        
    except StopIteration:
        break
        
    l_frame = l_frame.next

if not embedding_saved:
    print("No valid embeddings found in any frame for saving")

return Gst.PadProbeReturn.OK

I also attempted to extract embeddings from frame_meta as a trial. However, in this case, embeddings were generated for the entire frame rather than just the detected objects. This led to embeddings being created even when no object was detected, resulting in inconsistent recognition results.

Additionally, I observed inconsistencies in person recognition across different videos:

  • In one video, the person is correctly recognized.
  • In another video, the same person is not recognized, leading to inconsistent identification.

How can I troubleshoot this issue? Is the problem coming from the script I am using to extract the embeddings?

I also tried extracting metadata from both the frame meta user list and the object meta user list. In the recognition pipeline, the tensor comes from the object user meta list, whereas in the training pipeline, the output tensor comes from the frame user meta list.

1.Each frame may contain multiple faces, In general this model should input face object instead of frame.
Of course, I don’t know your sgie model, so I need you to confirm whether to enter frame or object.

2.This python code is just to extract NVDSINFER_TENSOR_OUTPUT_META,I think there is no problem
3.If pgie is working properly, the problem may be in the configuration or model of sgie. If you can input the entire frame, then you don’t need pgie to detect faces.

What is training pipeline? Deepstream is not used for training.

@junshengy ,

My processing video has multiple faces , so I need to concentrate on the detected objects not frames .

Let me share my config files and app files as well :

Pgie config file :

cofig_infer_yolov8_forum.txt (944 Bytes)

Sgie config file:

sgie_config_webface.txt (798 Bytes)

I am using this Sgie model :

w600k_mbf.zip (12.0 MB)

app file : (in .txt format)

app.txt (5.7 KB)

#infer-dims=3;160;160
net-scale-factor=0.0078125
offsets=127.5;127.5;127.5

input-tensor-meta=1

What is this? Does performing normalization and mean subtraction satisfy your sgie input? I don’t know much about this model. Please debug it by yourself.

Hi @junshengy,

Thanks for your inputs every time. I debugged the code and found the issue. The problem I’m facing is that the sgie is not receiving the detection results from the pgie.

Step 1: I attached the probe function to the src pad of the pgie to check whether the pgie model is detecting the results or not. Here, I got the output.

Step 2: I attached the probe function to the sgie src pad to get the embeddings generated by the sgie engine. During this process, I found that the sgie is not receiving the detected objects’ metadata to generate embeddings. To investigate further, I proceeded with step 3.

Step 3: I attached the probe function to the sink pad of the sgie to check whether the detection results were being passed through. This confirmed that the sgie is not receiving the results.

How can I solve this? How will the pgie results be passed to the sgie?

I am sharing my config files. Please let me know if I need to add or change any properties to pass the pgie results to the sgie.

pgie config file :

cofig_infer_yolov8_forum.txt (938 Bytes)

sgie config file :

sgie_config_webface.txt (799 Bytes)

I link the elements of pipeline in the following order :

streammux.link(pgie)
pgie.link(tracker)
tracker.link(sgie)
sgie.link(tiler)

tiler.link(nvvidconv)
nvvidconv.link(nvosd)
nvosd.link(sink)

#!/usr/bin/env python3

################################################################################
# SPDX-FileCopyrightText: Copyright (c) 2019-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################

import sys
sys.path.append('../')
import platform
import configparser

import gi
gi.require_version('Gst', '1.0')
from gi.repository import GLib, Gst
from common.platform_info import PlatformInfo
from common.bus_call import bus_call
import numpy as np
import pyds

MUXER_BATCH_TIMEOUT_USEC = 33000

def layer_tensor_to_ndarray(layer: pyds.NvDsInferLayerInfo) -> np.ndarray:
    import ctypes
    if layer.dataType == pyds.NvDsInferDataType.FLOAT:
        # print(f"int_addr {type(pyds.get_ptr(layer.buffer))}") <class 'int'>
        addr = pyds.get_ptr(layer.buffer)
        # print(f"addr {type(addr)}") <class 'ctypes.c_int'>
        data_ptr = ctypes.cast(addr, ctypes.POINTER(ctypes.c_float))
        num_dims = layer.inferDims.numDims
        shape = []
        for i in range(num_dims):
            shape.append(layer.inferDims.d[i])
        # print(f"{shape}")
        layer_array = np.ctypeslib.as_array(data_ptr, shape=shape)
        # print(f"layer_array {type(layer_array)}")
        layer_ny = np.frombuffer(layer_array, dtype=np.float32)
        print(f"boxes {type(layer_ny)}")
        return layer_ny
    return None

def sgie_src_pad_buffer_probe(pad, info, u_data):
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return

    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            # Note that l_frame.data needs a cast to pyds.NvDsFrameMeta
            # The casting is done by pyds.NvDsFrameMeta.cast()
            # The casting also keeps ownership of the underlying memory
            # in the C code, so the Python garbage collector will leave
            # it alone.
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        l_obj=frame_meta.obj_meta_list
        while l_obj is not None:
            try:
                # Casting l_obj.data to pyds.NvDsObjectMeta
                obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
                l_user_meta = obj_meta.obj_user_meta_list
                # Extract object level meta data from NvDsAnalyticsObjInfo
                while l_user_meta:
                    try:
                        user_meta = pyds.NvDsUserMeta.cast(l_user_meta.data)
                    except StopIteration:
                        break
                    if user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META:
                        tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
                        layers_info = []
                        for i in range(tensor_meta.num_output_layers):
                            layer = pyds.get_nvds_LayerInfo(tensor_meta, i)
                            layers_info.append(layer)
                            print(f"layer {layer.layerName} shape {layer.inferDims.d[0]}x{layer.inferDims.d[1]}x{layer.inferDims.d[2]}")
                            layer_tensor_to_ndarray(layer)
                    try:
                        l_user_meta = l_user_meta.next
                    except StopIteration:
                        break
            except StopIteration:
                break
            try: 
                l_obj=l_obj.next
            except StopIteration:
                break
        try:
            l_frame=l_frame.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK

def main(args):
    # Check input arguments
    if(len(args)<2):
        sys.stderr.write("usage: %s <h264_elementary_stream>\n" % args[0])
        sys.exit(1)

    platform_info = PlatformInfo()
    # Standard GStreamer initialization

    Gst.init(None)

    # Create gstreamer elements
    # Create Pipeline element that will form a connection of other elements
    print("Creating Pipeline \n ")
    pipeline = Gst.Pipeline()

    if not pipeline:
        sys.stderr.write(" Unable to create Pipeline \n")

    # Source element for reading from the file
    print("Creating Source \n ")
    source = Gst.ElementFactory.make("filesrc", "file-source")
    if not source:
        sys.stderr.write(" Unable to create Source \n")

    # Since the data format in the input file is elementary h264 stream,
    # we need a h264parser
    print("Creating H264Parser \n")
    h264parser = Gst.ElementFactory.make("h264parse", "h264-parser")
    if not h264parser:
        sys.stderr.write(" Unable to create h264 parser \n")

    # Use nvdec_h264 for hardware accelerated decode on GPU
    print("Creating Decoder \n")
    decoder = Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")
    if not decoder:
        sys.stderr.write(" Unable to create Nvv4l2 Decoder \n")

    # Create nvstreammux instance to form batches from one or more sources.
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
    if not streammux:
        sys.stderr.write(" Unable to create NvStreamMux \n")

    # Use nvinfer to run inferencing on decoder's output,
    # behaviour of inferencing is set through config file
    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
    if not pgie:
        sys.stderr.write(" Unable to create pgie \n")

    sgie1 = Gst.ElementFactory.make("nvinfer", "secondary1-nvinference-engine")
    if not sgie1:
        sys.stderr.write(" Unable to make sgie1 \n")

    sink = Gst.ElementFactory.make("fakesink", "fakesink")

    print("Playing file %s " %args[1])
    source.set_property('location', args[1])
    streammux.set_property('width', 1920)
    streammux.set_property('height', 1080)
    streammux.set_property('batch-size', 1)
    streammux.set_property('batched-push-timeout', MUXER_BATCH_TIMEOUT_USEC)

    #Set properties of pgie and sgie
    pgie.set_property('config-file-path', "pgie.txt")
    sgie1.set_property('config-file-path', "sgie.txt")

    print("Adding elements to Pipeline \n")
    pipeline.add(source)
    pipeline.add(h264parser)
    pipeline.add(decoder)
    pipeline.add(streammux)
    pipeline.add(pgie)
    pipeline.add(sgie1)
    pipeline.add(sink)

    # we link the elements together
    # file-source -> h264-parser -> nvh264-decoder ->
    # nvinfer -> nvvidconv -> nvosd -> video-renderer
    print("Linking elements in the Pipeline \n")
    source.link(h264parser)
    h264parser.link(decoder)

    sinkpad = streammux.request_pad_simple("sink_0")
    if not sinkpad:
        sys.stderr.write(" Unable to get the sink pad of streammux \n")
    srcpad = decoder.get_static_pad("src")
    if not srcpad:
        sys.stderr.write(" Unable to get source pad of decoder \n")
    srcpad.link(sinkpad)
    streammux.link(pgie)
    pgie.link(sgie1)
    sgie1.link(sink)

    # create and event loop and feed gstreamer bus mesages to it
    loop = GLib.MainLoop()

    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect ("message", bus_call, loop)

    # Lets add probe to get informed of the meta data generated, we add probe to
    # the sink pad of the osd element, since by that time, the buffer would have
    # had got all the metadata.
    sgiesrcpad = sgie1.get_static_pad("src")
    if not sgiesrcpad:
        sys.stderr.write(" Unable to get src pad of sgie \n")
    sgiesrcpad.add_probe(Gst.PadProbeType.BUFFER, sgie_src_pad_buffer_probe, 0)

    print("Starting pipeline \n")
    # start play back and listed to events
    pipeline.set_state(Gst.State.PLAYING)
    try:
      loop.run()
    except:
      pass

    # cleanup
    pipeline.set_state(Gst.State.NULL)

if __name__ == '__main__':
    sys.exit(main(sys.argv))

sgie

[property]
gpu-id=0
gie-unique-id=2
onnx-file=webface_r50_dynamic_simplify_cleanup.onnx
model-engine-file=webface_r50_dynamic_simplify_cleanup.onnx_b1_gpu0_fp16.engine
batch-size=1
process-mode=2
network-type=100
network-mode=2
operate-on-gie-id=1
operate-on-class-ids=0
#infer-dims=3;160;160
classifier-async-mode=0
model-color-format=0
output-tensor-meta=1

I can get right result.

layer 683 shape 512x0x0
boxes <class 'numpy.ndarray'>
layer 683 shape 512x0x0

Refer to this FAQ

Hi @junshengy,

I will try the approach you suggested. However, while receiving the frames from the pgie metadata for saving, I am encountering the following error:

arduino

Copy

"Currently we only support RGBA or RGB color format"

To overcome this issue, I added a capsfilter and tried running the pipeline again. But now, I am getting the following error:

ruby

Copy

0:00:09.779035433 412429 0x5606ffcf93f0 INFO nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:/home/dstream/Documents/Deep_Stream_App/configs/cofig_infer_yolov8_forum.txt successfully
Decodebin child added: source
Decodebin child added: decodebin0
Decodebin child added: qtdemux0
Decodebin child added: multiqueue0
Decodebin child added: h264parse0
Decodebin child added: capsfilter0
Decodebin child added: aacparse0
Decodebin child added: avdec_aac0
Decodebin child added: nvv4l2decoder0
New pad callback
Pad type: video/x-raw
New pad callback
Pad type: audio/x-raw
Error: gst-stream-error-quark: Internal data stream error. (1)
Debug info: ../gst/isomp4/qtdemux.c(6760): gst_qtdemux_loop (): /GstPipeline:pipeline0/GstBin:source-bin-00/GstURIDecodeBin:uri-decode-bin/GstDecodeBin:decodebin0/GstQTDemux:qtdemux0:
streaming stopped, reason not-negotiated (-4)
[NvMultiObjectTracker] De-initialized
[8]- Killed python3 app.py

Pipeline Code:

python

Copy

import sys
import gi
gi.require_version('Gst', '1.0')
gi.require_version('GLib', '2.0')
from gi.repository import Gst, GLib, GObject
from data_loading import *
from custom_probe_2 import *

def decodebin_child_added(child_proxy, Object, name, user_data):
    print(f"Decodebin child added: {name}")
    if "decodebin" in name:
        Object.connect("child-added", decodebin_child_added, user_data)

    if "source" in name:
        source_element = child_proxy.get_by_name("source")
        if source_element and source_element.find_property('drop-on-latency') is not None:
            source_element.set_property("drop-on-latency", True)

def cb_newpad(decodebin, decoder_src_pad, data):
    print("New pad callback")
    caps = decoder_src_pad.get_current_caps()
    if not caps:
        caps = decoder_src_pad.query_caps(None)
    
    gststruct = caps.get_structure(0)
    gstname = gststruct.get_name()
    source_bin = data

    print(f"Pad type: {gstname}")
    if "video" in gstname:
        bin_ghost_pad = source_bin.get_static_pad("src")
        if not bin_ghost_pad.set_target(decoder_src_pad):
            sys.stderr.write("Failed to link decoder src pad to source bin ghost pad\n")

def create_source_bin(index, uri):
    bin_name = f"source-bin-{index:02d}"
    nbin = Gst.Bin.new(bin_name)
    if not nbin:
        sys.stderr.write("Unable to create source bin\n")
        return None

    uri_decode_bin = Gst.ElementFactory.make("uridecodebin", "uri-decode-bin")
    if not uri_decode_bin:
        sys.stderr.write("Unable to create uridecodebin\n")
        return None

    uri_decode_bin.set_property("uri", uri)
    uri_decode_bin.connect("pad-added", cb_newpad, nbin)
    uri_decode_bin.connect("child-added", decodebin_child_added, nbin)

    Gst.Bin.add(nbin, uri_decode_bin)
    
    if not nbin.add_pad(Gst.GhostPad.new_no_target("src", Gst.PadDirection.SRC)):
        sys.stderr.write("Failed to add ghost pad in source bin\n")
        return None

    return nbin

def main(cfg):
    Gst.init(None)
    print("Creating DeepStream Face Detection Pipeline")

    # Create Pipeline
    pipeline = Gst.Pipeline()

    if not pipeline:
        print("Error: Unable to create pipeline")
        sys.exit(1)
    else:
        print("Pipeline created successfully")
    
    # Create Stream Muxer
    streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
    pipeline.add(streammux)
    set_property(cfg, streammux, "streammux")

    # Create and Add Source Bin
    sources = cfg['source']
    source_bin = create_source_bin(0, list(sources.values())[0])
    pipeline.add(source_bin)

    # Link Source to Stream Muxer
    sinkpad = streammux.get_request_pad("sink_0")
    srcpad = source_bin.get_static_pad("src")

    if sinkpad is None or srcpad is None:
        print("Error: Source or Streammux pad not found!")
    else:
        print(">>> Linking Source Bin to StreamMuxer")
        srcpad.link(sinkpad)

    # Create Primary Inference (Face Detection)
    caps1 = Gst.Caps.from_string("video/x-raw(memory:NVMM), format=RGBA")
    filter1 = Gst.ElementFactory.make("capsfilter", "filter1")
    filter1.set_property("caps", caps1)
    pipeline.add(filter1)

    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
    pipeline.add(pgie)
    set_property(cfg, pgie, "pgie")

    tracker = Gst.ElementFactory.make("nvtracker", "tracker")
    pipeline.add(tracker)
    set_tracker_properties(tracker, cfg['tracker']['config-file-path'])

    # Create Tiler
    tiler = Gst.ElementFactory.make("nvmultistreamtiler", "nvtiler")
    pipeline.add(tiler)
    tiler.set_property("rows", 1)
    tiler.set_property("columns", 1)
    tiler.set_property("width", 1920)
    tiler.set_property("height", 1080)

    # Create Video Converter
    nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
    pipeline.add(nvvidconv)

    # Create On-Screen Display
    nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
    nvosd.set_property("process-mode", 0)  # Default mode (draw bounding boxes)
    nvosd.set_property("display-text", 1)
    pipeline.add(nvosd)

    # Create Sink
    sink = Gst.ElementFactory.make("nveglglessink", "file-sink")
    pipeline.add(sink)
    sink.set_property("sync", 0)

    print(">>> After creating elements linking of elements is started")

    streammux.link(filter1)
    filter1.link(pgie)

    pgie.link(tracker)
    tracker.link(tiler)
    tiler.link(nvvidconv)
    nvvidconv.link(nvosd)
    nvosd.link(sink)

    pgie_src_pad = pgie.get_static_pad("src")
    if pgie_src_pad:
        pgie_src_pad.add_probe(Gst.PadProbeType.BUFFER, pgie_sink_pad_buffer_probe)

    loop = GLib.MainLoop()

    # Bus Message Handling
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, loop)

    # Start Pipeline
    pipeline.set_state(Gst.State.PLAYING)
    
    try:
        loop.run()
    except Exception as e:
        print(f"Pipeline error: {e}")
    finally:
        pipeline.set_state(Gst.State.NULL)

def bus_call(bus, message, loop):
    t = message.type
    if t == Gst.MessageType.EOS:
        print("End-of-stream")
        loop.quit()
    elif t == Gst.MessageType.ERROR:
        err, debug = message.parse_error()
        print(f"Error: {err}")
        print(f"Debug info: {debug}")
        loop.quit()
    return True

if __name__ == '__main__':
    cfg = parse_args(cfg_path="/home/dstream/Documents/Detection_Deep_Stream_App/paths/paths.toml")
    main(cfg)

My Question:

Is there anything wrong in my pipeline, especially related to the capsfilter or the overall configuration that might cause the stream error gst-stream-error-quark: Internal data stream error (1)?

Please refer to the code provided above to modify your code