Deep Stream

Hi @fanzh and @gff1038m Can you please help me in solving the issue,
I am developing a face Recognition with the Deep Stream.
My system configuration :

  • GPU: NVIDIA A6000
  • TensorRT Version: 8.6.1
  • CUDA Version: 12.2
    DeepStream Version: 7.0

I use Facedetectir model from the NGC catalog to detect the faces, when I use that model I got the performance issue while the model is detecting at some frames while in other frames no faces are detecting , so I changed to the yolov8n face detection model and convert that into .engine model using code :
from ultralytics import YOLO
model = YOLO(‘yolov8n-face.pt’) model.export(format=‘engine’,device=0, half=False) # Full precision
When I use this converted engine with the Deep Stream I am getting issue “failed to build the engine”

So I tried to load the engine with the trtexec command then I got issue in dserialization

So, to resolve this issue I converted the model into .onnx and then .engine, dserialization issue is solved, but no faces are detecting from the frames
my yolo model config file is
yolov8_infer_config.txt (835 Bytes)

I also tried by keep the network mode as 100 for yolo models.

my custom parser code used for parsing the yolov8n face detection model output is :
include
include
include
include
include
include “nvdsinfer_custom_impl.h”

static const int NUM_CLASSES_YOLO = 1; // Face detection
static const float CONFIDENCE_THRESHOLD = 0.5;

extern “C” bool NvDsInferParseCustomYoloV8(
std::vector const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams,
std::vector& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << “ERROR: Could not find output layer in bbox parsing” << std::endl;
return false;
}

const NvDsInferLayerInfo& layer = outputLayersInfo[0];

// Validate output dimensions
if (layer.inferDims.numDims != 2) {
    std::cerr << "ERROR: Expected 2 dimensions in output tensor, but got "
              << layer.inferDims.numDims << " (Shape: ";
    for (int i = 0; i < layer.inferDims.numDims; i++) {
        std::cerr << layer.inferDims.d[i] << " ";
    }
    std::cerr << ")" << std::endl;
    return false;
}

if (!layer.buffer) {
    std::cerr << "ERROR: Layer buffer is null!" << std::endl;
    return false;
}

//int batch_size = layer.inferDims.d[0];  // Number of batches
int num_boxes = layer.inferDims.d[0]; // 20 (anchors per feature map point)
int num_features = layer.inferDims.d[1]; // 8400 (grid points)

std::cout << "Processing output with dimensions: " 
          << num_boxes << "x" << num_features << std::endl;

if (num_features < 6) { // Must have at least x, y, w, h, and confidence
    std::cerr << "ERROR: Feature vector too small " << std::endl;
    return false;
}

const float* output = (const float*)layer.buffer;

for (int i = 0; i < num_boxes; i++) {
    float x = output[i * num_features];
    float y = output[i * num_features + 1];
    float w = output[i * num_features + 2];
    float h = output[i * num_features + 3];
    float confidence = output[i * num_features + 4];

    if (confidence < CONFIDENCE_THRESHOLD) continue;  // Ignore low-confidence detections

    NvDsInferParseObjectInfo obj{0};

    // Normalize the bbox coordinates if necessary
    float scale_x = networkInfo.width / 1.0f;
    float scale_y = networkInfo.height / 1.0f;

    if (x <= 1.0 && y <= 1.0 && w <= 1.0 && h <= 1.0) {
        x *= scale_x;
        y *= scale_y;
        w *= scale_x;
        h *= scale_y;
    }

    obj.left = x - w / 2;
    obj.top = y - h / 2;
    obj.width = w;
    obj.height = h;

    // Ensure bbox stays inside image bounds
    obj.left = std::max(0.0f, std::min(obj.left, (float)networkInfo.width - 1));
    obj.top = std::max(0.0f, std::min(obj.top, (float)networkInfo.height - 1));
    obj.width = std::min(obj.width, (float)networkInfo.width - obj.left);
    obj.height = std::min(obj.height, (float)networkInfo.height - obj.top);

    obj.detectionConfidence = confidence;
    obj.classId = (num_features > 5) ? (int)output[i * num_features + 5] : 0;  // Class ID

    if (obj.width > 0 && obj.height > 0) {
        objectList.push_back(obj);
    }
}

return true;

}

CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV8);

Note: My yolov8n face detection model is a dynamic model while converting into deepstream we need set the shapes,I set the shapes as 3,640,640 for the model I am using . Help me in cracking this.

What was the export command used for ONNX in Ultralytics?

Also check this regarding deserialization.

This is what I used to convert into onnx model

from ultralytics import YOLO

Load YOLO model

model_path = “/home/dstream/Documents/Deep_Stream_App/models/yolov8n/yolov8n-face.pt”

model = YOLO(model_path)

Convert to ONNX and explicitly save it with the desired name

model.export(format=‘onnx’, dynamic=True, opset=11)

print(f"Model has been successfully converted")

And for converting into .engine I use trtexec command “trtexec --onnx=resnet50/model.onnx --saveEngine=resnet_engine_intro.engine”

You probably should be setting the shapes during trtexec conversion.

Yes , right

I am currently working on an face Recognition with Deepstream, a custom parser is used for parsing YOLOv8n model outputs, along with a custom probe to display metadata. However, I am facing an issue where no objects are being detected in my frames, despite expecting detections from the model.

Problem Description:

When I use the following custom parser and probe, I observe the following lines in my output logs:

Entering PGIE filter function
Processing frame 0
No objects detected in frame 0
Entering PGIE filter function
Processing frame 0
No objects detected in frame 0

I am using the following custom parser for parsing the YOLOv8n model’s output:

include

include “nvdsinfer_custom_impl.h”

include “utils.h”

define NMS_THRESH 0.45;

extern “C” bool
NvDsInferParseYoloFace(std::vector const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector& objectList);

static std::vector
nonMaximumSuppression(std::vector binfo)
{
auto overlap1D = (float x1min, float x1max, float x2min, float x2max) → float {
if (x1min > x2min) {
std::swap(x1min, x2min);
std::swap(x1max, x2max);
}
return x1max < x2min ? 0 : std::min(x1max, x2max) - x2min;
};

auto computeIoU = [&overlap1D](NvDsInferInstanceMaskInfo& bbox1, NvDsInferInstanceMaskInfo& bbox2) → float {
float overlapX = overlap1D(bbox1.left, bbox1.left + bbox1.width, bbox2.left, bbox2.left + bbox2.width);
float overlapY = overlap1D(bbox1.top, bbox1.top + bbox1.height, bbox2.top, bbox2.top + bbox2.height);
float area1 = (bbox1.width) * (bbox1.height);
float area2 = (bbox2.width) * (bbox2.height);
float overlap2D = overlapX * overlapY;
float u = area1 + area2 - overlap2D;
return u == 0 ? 0 : overlap2D / u;
};

std::stable_sort(binfo.begin(), binfo.end(), (const NvDsInferInstanceMaskInfo& b1, const NvDsInferInstanceMaskInfo& b2) {
return b1.detectionConfidence > b2.detectionConfidence;
});

std::vector out;
for (auto i : binfo) {
bool keep = true;
for (auto j : out) {
if (keep) {
float overlap = computeIoU(i, j);
keep = overlap <= NMS_THRESH;
}
else {
break;
}
}
if (keep) {
out.push_back(i);
}
}
return out;
}

static std::vector
nmsAllClasses(std::vector& binfo)
{
std::vector result = nonMaximumSuppression(binfo);
return result;
}

static void
addFaceProposal(const float* landmarks, const uint& landmarksSizeRaw, const uint& netW, const uint& netH, const uint& b,
NvDsInferInstanceMaskInfo& bbi)
{
uint landmarksSize = landmarksSizeRaw == 10 ? landmarksSizeRaw + 5 : landmarksSizeRaw;
bbi.mask = new float[landmarksSize];
for (uint p = 0; p < landmarksSize / 3; ++p) {
if (landmarksSizeRaw == 10) {
bbi.mask[p * 3 + 0] = clamp(landmarks[b * landmarksSizeRaw + p * 2 + 0], 0, netW);
bbi.mask[p * 3 + 1] = clamp(landmarks[b * landmarksSizeRaw + p * 2 + 1], 0, netH);
bbi.mask[p * 3 + 2] = 1.0;
}
else {
bbi.mask[p * 3 + 0] = clamp(landmarks[b * landmarksSize + p * 3 + 0], 0, netW);
bbi.mask[p * 3 + 1] = clamp(landmarks[b * landmarksSize + p * 3 + 1], 0, netH);
bbi.mask[p * 3 + 2] = landmarks[b * landmarksSize + p * 3 + 2];
}
}
bbi.mask_width = netW;
bbi.mask_height = netH;
bbi.mask_size = sizeof(float) * landmarksSize;
}

static NvDsInferInstanceMaskInfo
convertBBox(const float& bx1, const float& by1, const float& bx2, const float& by2, const uint& netW, const uint& netH)
{
NvDsInferInstanceMaskInfo b;

float x1 = bx1;
float y1 = by1;
float x2 = bx2;
float y2 = by2;

x1 = clamp(x1, 0, netW);
y1 = clamp(y1, 0, netH);
x2 = clamp(x2, 0, netW);
y2 = clamp(y2, 0, netH);

b.left = x1;
b.width = clamp(x2 - x1, 0, netW);
b.top = y1;
b.height = clamp(y2 - y1, 0, netH);

return b;
}

static void
addBBoxProposal(const float bx1, const float by1, const float bx2, const float by2, const uint& netW, const uint& netH,
const int maxIndex, const float maxProb, NvDsInferInstanceMaskInfo& bbi)
{
bbi = convertBBox(bx1, by1, bx2, by2, netW, netH);

if (bbi.width < 1 || bbi.height < 1) {
return;
}

bbi.detectionConfidence = maxProb;
bbi.classId = maxIndex;
}

static std::vector
decodeTensorYoloFace(const float* boxes, const float* scores, const float* landmarks, const uint& outputSize,
const uint& landmarksSize, const uint& netW, const uint& netH, const std::vector& preclusterThreshold)
{
std::vector binfo;

for (uint b = 0; b < outputSize; ++b) {
float maxProb = scores[b];

if (maxProb < preclusterThreshold[0]) {
  continue;
}

float bxc = boxes[b * 4 + 0];
float byc = boxes[b * 4 + 1];
float bw = boxes[b * 4 + 2];
float bh = boxes[b * 4 + 3];

float bx1 = bxc - bw / 2;
float by1 = byc - bh / 2;
float bx2 = bx1 + bw;
float by2 = by1 + bh;

NvDsInferInstanceMaskInfo bbi;

addBBoxProposal(bx1, by1, bx2, by2, netW, netH, 0, maxProb, bbi);
addFaceProposal(landmarks, landmarksSize, netW, netH, b, bbi);

binfo.push_back(bbi);

}

return binfo;
}

static bool
NvDsInferParseCustomYoloFace(std::vector const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << “ERROR: Could not find output layer in bbox parsing” << std::endl;
return false;
}

const NvDsInferLayerInfo& boxes = outputLayersInfo[0];
const NvDsInferLayerInfo& scores = outputLayersInfo[1];
const NvDsInferLayerInfo& landmarks = outputLayersInfo[2];

const uint outputSize = boxes.inferDims.d[0];
const uint landmarksSize = landmarks.inferDims.d[1];

std::vector objects = decodeTensorYoloFace((const float*) (boxes.buffer),
(const float*) (scores.buffer), (const float*) (landmarks.buffer), outputSize, landmarksSize, networkInfo.width,
networkInfo.height, detectionParams.perClassPreclusterThreshold);

objectList.clear();
objectList = nmsAllClasses(objects);

return true;
}

extern “C” bool
NvDsInferParseYoloFace(std::vector const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector& objectList)
{
return NvDsInferParseCustomYoloFace(outputLayersInfo, networkInfo, detectionParams, objectList);
}

CHECK_CUSTOM_INSTANCE_MASK_PARSE_FUNC_PROTOTYPE(NvDsInferParseYoloFace);

To extract and display the object detection metadata, I have implemented a custom probe function as follows:

def pgie_sink_pad_buffer_probe(pad, info):

print(">>> Entering PGIE filter function")
"""Extracts bounding boxes using an alternative list pointer approach."""
gst_buffer = info.get_buffer()
if not gst_buffer:
    print("No GstBuffer received.")
    return Gst.PadProbeReturn.OK

batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
if not batch_meta:
    print("No batch_meta found.")
    return Gst.PadProbeReturn.OK

l_frame = batch_meta.frame_meta_list
while l_frame:
    frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)

    # DEBUG: Print frame metadata
    print(f"Processing frame {frame_meta.batch_id}")

    l_obj = frame_meta.obj_meta_list
    if l_obj is None:
        print(f"No objects detected in frame {frame_meta.batch_id}")
    else:
        print(f"Objects detected in frame {frame_meta.batch_id}")

    # Alternative way to loop through objects (handling possible list pointers)
    while l_obj:
        try:
            obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
            rect_params = obj_meta.rect_params

            bbox = {
                "top": max(0, int(rect_params.top)),
                "left": max(0, int(rect_params.left)),
                "width": max(0, int(rect_params.width)),
                "height": max(0, int(rect_params.height))
            }
            print(f"Bounding Box: {bbox}")

            l_obj = l_obj.next  # Move to the next object
        except Exception as e:
            print(f"Error processing object: {e}")
            break

    l_frame = l_frame.next  # Move to the next frame

return Gst.PadProbeReturn.OK

observations :

  • I have confirmed that my custom probe is being triggered correctly (>>> Entering PGIE filter function is printed).
  • However, the object list appears empty in all frames: "No objects detected in frame 0".
  • No bounding box data is available in the output.

I am seeking your help in troubleshooting the issue, specifically in identifying any potential issues with the custom parser implementation and retrieving the metadata or adding the parsed output to the metadata.

According to my knowledge, DeepStream automatically adds the parsed output to the metadata. If that’s the case, why is my output not being added?