Face Embedding Generation Model Accuracy

adityawasnik91 · November 13, 2023, 4:09pm

Deepstream 6.2 dGPU

Im observing variation in bbox outputted by my pgie (face detection model), for the exact same image.

my pipeline is,
appsrc->jpegparse->nvv4l2decoder->streammux->pgie->tracker->sgie->sink

My Tracker config is :-

%YAML:1.0
  
NvDCF:
  # [General]
  useUniqueID: 1    # Use 64-bit long Unique ID when assignining tracker ID. Default is [true]
  maxTargetsPerStream: 99 # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity
  
  # [Feature Extraction]
  useColorNames: 1     # Use ColorNames feature
  useHog: 1            # Use Histogram-of-Oriented-Gradient (HOG) feature
  useHighPrecisionFeature: 1   # Use high-precision in feature extraction. Default is [true]

  # [DCF]
  filterLr: 0.15 # learning rate for DCF filter in exponential moving average. Valid Range: [0.0, 1.0]
  filterChannelWeightsLr: 0.22 # learning rate for the channel weights among feature channels. Valid Range: [0.0, 1.0]
  gaussianSigma: 0.75 # Standard deviation for Gaussian for desired response when creating DCF filter [pixels]
  featureImgSizeLevel: 3 # Size of a feature image. Valid range: {1, 2, 3, 4, 5}, from the smallest to the largest
  SearchRegionPaddingScale: 1 # Search region size. Determines how large the search region should be scaled from the target bbox.  Valid range: {1, 2, 3}, from the smallest to the largest
  
  # [MOT] [False Alarm Handling]
  maxShadowTrackingAge: 30  # Max length of shadow tracking (the shadow tracking age is incremented when (1) there's detector input yet no match or (2) tracker confidence is lower than minTrackerConfidence). Once reached, the tracker will be terminated.
  probationAge: 3           # Once the tracker age (incremented at every frame) reaches this, the tracker is considered to be valid
  earlyTerminationAge: 1    # Early termination age (in terms of shadow tracking age) during the probation period. If reached during the probation period, the tracker will be terminated prematurely.

  # [Tracker Creation Policy] [Target Candidacy]
  minDetectorConfidence: -1  # If the confidence of a detector bbox is lower than this, then it won't be considered for tracking
  minTrackerConfidence: 0.7  # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]
  minTargetBboxSize: 10      # If the width or height of the bbox size gets smaller than this threshold, the target will be terminated.
  minDetectorBboxVisibilityTobeTracked: 0.0  # If the detector-provided bbox's visibility (i.e., IOU with image) is lower than this, it won't be considered.  
  minVisibiilty4Tracking: 0.0  # If the visibility of the tracked object (i.e., IOU with image) is lower than this, it will be terminated immediately, assuming it is going out of scene.
  
  # [Tracker Termination Policy]
  targetDuplicateRunInterval: 5 # The interval in which the duplicate target detection removal is carried out. A Negative value indicates indefinite interval. Unit: [frames]
  minIou4TargetDuplicate: 0.9 # If the IOU of two target bboxes are higher than this, the newer target tracker will be terminated.

  # [Data Association] Matching method
  useGlobalMatching: 0   # If true, enable a global matching algorithm (i.e., Hungarian method). Otherwise, a greedy algorithm wll be used.

  # [Data Association] Thresholds in matching scores to be considered as a valid candidate for matching
  minMatchingScore4Overall: 0.0   # Min total score
  minMatchingScore4SizeSimilarity: 0.5    # Min bbox size similarity score
  minMatchingScore4Iou: 0.1       # Min IOU score
  minMatchingScore4VisualSimilarity: 0.2    # Min visual similarity score
  minTrackingConfidenceDuringInactive: 0.7  # Min tracking confidence during INACTIVE period. If tracking confidence is higher than this, then tracker will still output results until next detection 

  # [Data Association] Weights for each matching score term
  matchingScoreWeight4VisualSimilarity: 0.8  # Weight for the visual similarity (in terms of correlation response ratio)
  matchingScoreWeight4SizeSimilarity: 0.0    # Weight for the Size-similarity score
  matchingScoreWeight4Iou: 0.1               # Weight for the IOU score
  matchingScoreWeight4Age: 0.1               # Weight for the tracker age

  # [State Estimator]
  useTrackSmoothing: 1    # Use a state estimator
  stateEstimatorType: 1   # The type of state estimator among { moving_avg:1, kalman_filter:2 }

  # [State Estimator] [MovingAvgEstimator]
  trackExponentialSmoothingLr_loc: 0.5       # Learning rate for new location
  trackExponentialSmoothingLr_scale: 0.3     # Learning rate for new scale
  trackExponentialSmoothingLr_velocity: 0.05  # Learning rate for new velocity

  # [State Estimator] [Kalman Filter] 
  kfProcessNoiseVar4Loc: 0.1   # Process noise variance for location in Kalman filter
  kfProcessNoiseVar4Scale: 0.04   # Process noise variance for scale in Kalman filter
  kfProcessNoiseVar4Vel: 0.04   # Process noise variance for velocity in Kalman filter
  kfMeasurementNoiseVar4Trk: 9   # Measurement noise variance for tracker's detection in Kalman filter
  kfMeasurementNoiseVar4Det: 9   # Measurement noise variance for detector's detection in Kalman filter
  
  # [Past-frame Data] 
  useBufferedOutput: 0   # Enable storing of past-frame data in a buffer and report it back
  
  # [Instance-awareness]
  useInstanceAwareness: 0 # Use instance-awareness for multi-object tracking
  lambda_ia: 2            # Regularlization factor for each instance
  maxInstanceNum_ia: 4    # The number of nearby object instances to use for instance-awareness

My PGIE config is:-

[property]
gpu-id=0
#net-scale-factor=0.0039215697906911373
net-scale-factor=1.0
model-color-format=0
uff-input-order=0
onnx-file=model_480x640.onnx
model-engine-file=model_480x640.onnx_b1_gpu0_fp32.engine
labelfile-path=centerface_labels.txt
batch-size=1
network-mode=2
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
#output-tensor-meta=1
#drop-frame-interval=5
output-blob-names=537;538;539;540
#input-object-min-width=50
#input-object-min-height=50
parse-bbox-func-name=NvDsInferParseCustomCenterNetFace
custom-lib-path=libnvds_infercustomparser_centernet.so

sgie config :-

[property]
gpu-id=0
gie-unique-id=2
model-engine-file=glintr100.onnx_b1_gpu0_fp16.engine
onnx-file=glintr100.onnx
# batch-size=1
net-scale-factor=0.0078125
offsets=127.5;127.5;127.5
#force-implicit-batch-dim=1
model-color-format=0
network-mode=2
process-mode=2
network-type=100
output-tensor-meta=1
symmetric-padding=1
classifier-async-mode=0
operate-on-gie-id=1
operate-on-class-ids=0
tensor-meta-pool-size=200
#input-object-min-width=80
#input-object-min-height=80

sgie onnx model link :- insightface arcface

Im attaching pgie model onnx file. It is centerface model.
centerface.zip (6.7 MB)

Custom Parser

/*
 * Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/* This custom post processing parser is for centernet face detection model */
#include <cstring>
#include <iostream>
#include "nvdsinfer_custom_impl.h"
#include <cassert>
#include <cmath>
#include <tuple>
#include <memory>
#include <opencv2/opencv.hpp>

#define CLIP(a, min, max) (MAX(MIN(a, max), min))

/* C-linkage to prevent name-mangling */
extern "C" bool NvDsInferParseCustomTfSSD(std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
										  NvDsInferNetworkInfo const &networkInfo,
										  NvDsInferParseDetectionParams const &detectionParams,
										  std::vector<NvDsInferObjectDetectionInfo> &objectList);

/* This is a smaple bbox parsing function for the centernet face detection onnx model*/
struct FrcnnParams
{
	int inputHeight;
	int inputWidth;
	int outputClassSize;
	float visualizeThreshold;
	int postNmsTopN;
	int outputBboxSize;
	std::vector<float> classifierRegressorStd;
};

struct FaceInfo
{
	float x1;
	float y1;
	float x2;
	float y2;
	float score;
	float landmarks[10];
};

/* NMS for centernet */
static void nms(std::vector<FaceInfo> &input, std::vector<FaceInfo> &output, float nmsthreshold)
{
	std::sort(input.begin(), input.end(),
			  [](const FaceInfo &a, const FaceInfo &b) {
				  return a.score > b.score;
			  });

	int box_num = input.size();

	std::vector<int> merged(box_num, 0);

	for (int i = 0; i < box_num; i++)
	{
		if (merged[i])
			continue;

		output.push_back(input[i]);

		float h0 = input[i].y2 - input[i].y1 + 1;
		float w0 = input[i].x2 - input[i].x1 + 1;

		float area0 = h0 * w0;

		for (int j = i + 1; j < box_num; j++)
		{
			if (merged[j])
				continue;

			float inner_x0 = input[i].x1 > input[j].x1 ? input[i].x1 : input[j].x1; //std::max(input[i].x1, input[j].x1);
			float inner_y0 = input[i].y1 > input[j].y1 ? input[i].y1 : input[j].y1;

			float inner_x1 = input[i].x2 < input[j].x2 ? input[i].x2 : input[j].x2; //bug fixed ,sorry
			float inner_y1 = input[i].y2 < input[j].y2 ? input[i].y2 : input[j].y2;

			float inner_h = inner_y1 - inner_y0 + 1;
			float inner_w = inner_x1 - inner_x0 + 1;

			if (inner_h <= 0 || inner_w <= 0)
				continue;

			float inner_area = inner_h * inner_w;

			float h1 = input[j].y2 - input[j].y1 + 1;
			float w1 = input[j].x2 - input[j].x1 + 1;

			float area1 = h1 * w1;

			float score;

			score = inner_area / (area0 + area1 - inner_area);

			if (score > nmsthreshold)
				merged[j] = 1;
		}
	}
}
/* For CenterNetFacedetection */
//extern "C"
static std::vector<int> getIds(float *heatmap, int h, int w, float thresh)
{
	std::vector<int> ids;
	for (int i = 0; i < h; i++)
	{
		for (int j = 0; j < w; j++)
		{

			//			std::cout<<"ids"<<heatmap[i*w+j]<<std::endl;
			if (heatmap[i * w + j] > thresh)
			{
				//				std::array<int, 2> id = { i,j };
				ids.push_back(i);
				ids.push_back(j);
				//	std::cout<<"print ids"<<i<<std::endl;
			}
		}
	}
	return ids;
}

/* customcenternetface */
extern "C" bool NvDsInferParseCustomCenterNetFace(std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
												  NvDsInferNetworkInfo const &networkInfo,
												  NvDsInferParseDetectionParams const &detectionParams,
												  std::vector<NvDsInferObjectDetectionInfo> &objectList)
{
	auto layerFinder = [&outputLayersInfo](const std::string &name)
		-> const NvDsInferLayerInfo * {
		for (auto &layer : outputLayersInfo)
		{

			if (layer.dataType == FLOAT &&
				(layer.layerName && name == layer.layerName))
			{
				return &layer;
			}
		}
		return nullptr;
	};
	objectList.clear();
	const NvDsInferLayerInfo *heatmap = layerFinder("537");
	const NvDsInferLayerInfo *scale = layerFinder("538");
	const NvDsInferLayerInfo *offset = layerFinder("539");
	const NvDsInferLayerInfo *landmarks = layerFinder("540");
	//    std::cout<<"width"<<&networkInfo.width<<std::endl;

	if (!heatmap || !scale || !offset || !landmarks)
	{
		std::cerr << "ERROR: some layers missing or unsupported data types "
				  << "in output tensors" << std::endl;
		return false;
	}

	int fea_h = 120 ; // heatmap->inferDims.d[1]; //#heatmap.size[2];
	int fea_w = 160 ; //heatmap->inferDims.d[2]; //heatmap.size[3];
	int spacial_size = fea_w * fea_h;
	//	std::cout<<"features"<<fea_h<<"width"<<fea_w<<std::endl;
	float *heatmap_ = (float *)(heatmap->buffer);

	float *scale0 = (float *)(scale->buffer);
	float *scale1 = scale0 + spacial_size;

	float *offset0 = (float *)(offset->buffer);
	float *offset1 = offset0 + spacial_size;
	float *lm = (float *)landmarks->buffer;

	float scoreThresh = 0.5;
	std::vector<int> ids = getIds(heatmap_, fea_h, fea_w, scoreThresh);
	//?? d_w, d_h
	int width = networkInfo.width;
	int height = networkInfo.height;
	int d_h = (int)(std::ceil(height / 32) * 32);
	int d_w = (int)(std::ceil(width / 32) * 32);
	//	int d_scale_h = height/d_h ;
	//	int d_scale_w = width/d_w ;
	//	float scale_w = (float)width / (float)d_w;
	//	float scale_h = (float)height / (float)d_h;
	std::vector<FaceInfo> faces_tmp;
	std::vector<FaceInfo> faces;
	for (int i = 0; i < ids.size() / 2; i++)
	{
		int id_h = ids[2 * i];
		int id_w = ids[2 * i + 1];
		int index = id_h * fea_w + id_w;

		float s0 = std::exp(scale0[index]) * 4;
		float s1 = std::exp(scale1[index]) * 4;
		float o0 = offset0[index];
		float o1 = offset1[index];
		float x1 = std::max(0., (id_w + o1 + 0.5) * 4 - s1 / 2);
		float y1 = std::max(0., (id_h + o0 + 0.5) * 4 - s0 / 2);
		float x2 = 0, y2 = 0;
		x1 = std::min(x1, (float)d_w);
		y1 = std::min(y1, (float)d_h);
		x2 = std::min(x1 + s1, (float)d_w);
		y2 = std::min(y1 + s0, (float)d_h);

		FaceInfo facebox;
		facebox.x1 = x1;
		facebox.y1 = y1;
		facebox.x2 = x2;
		facebox.y2 = y2;
		facebox.score = heatmap_[index];
		for (int j = 0; j < 5; j++)
		{
			facebox.landmarks[2 * j] = x1 + lm[(2 * j + 1) * spacial_size + index] * s1;
			facebox.landmarks[2 * j + 1] = y1 + lm[(2 * j) * spacial_size + index] * s0;
		}
		faces_tmp.push_back(facebox);
	}

	const float threshold = 0.5;
	nms(faces_tmp, faces, threshold);
	int temp = 123 ;
	for (int k = 0; k < faces.size(); k++)
	{
		NvDsInferObjectDetectionInfo object;
		/* Clip object box co-ordinates to network resolution */
		object.left = CLIP(faces[k].x1, 0, networkInfo.width - 1);
		object.top = CLIP(faces[k].y1, 0, networkInfo.height - 1);
		object.width = CLIP((faces[k].x2 - faces[k].x1), 0, networkInfo.width - 1);
		object.height = CLIP((faces[k].y2 - faces[k].y1), 0, networkInfo.height - 1);
// object.number = temp;
		temp++;

		if (object.width && object.height)
		{
			object.detectionConfidence = 0.99;//faces[k].score;
			object.classId = 0;

			objectList.push_back(object);
		}
	}
	return true;
}
/* Check that the custom function has been defined correctly */
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomCenterNetFace);

can you guide me in figuring out what am I doing wrong ?

more observations :- The face bbox change bit for the same image, The embedding generated for faces are very far away (L2 distance) for the same image.

adityawasnik91 · November 13, 2023, 4:17pm

appsrc element is reading base64 encoded images, decoding them, creating a buffer out of them and then pushing in the pipeline.

Im attaching a sample Image. It is a base64 encoded image.
sample_image.txt (456.4 KB)

Fiona.Chen · November 14, 2023, 2:11am

What do you mean by “variation”? What is the result without tracker?

adityawasnik91 · November 21, 2023, 11:26am

I did not check result without tracker. But the issue got solved.

By variation I meant, the bbox was not exactly same for the exact same image.

thanks

system · December 5, 2023, 11:27am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NvDCF Jitter DeepStream SDK	22	2372	October 12, 2021
How to smooth the bbox detections with the DCF tracking in DS5.0 GA? DeepStream SDK	26	3835	October 12, 2021
Test the tracker alone DeepStream SDK gstreamer	11	783	November 25, 2022
What model to use for face recognition? DeepStream SDK	23	3931	June 19, 2022
How to run Nvidia's example torch SSD net on Deepstream-App with objectDetector_SSD's custom plugin DeepStream SDK	10	995	October 12, 2021
Display facial landmarks, on the OSD DeepStream SDK tensorrt , gstreamer	13	1710	October 12, 2021
RuntimeError: get_nvds_buf_Surface: Currently we only support RGBA color Format DeepStream SDK deepstream	13	446	July 2, 2024
Face detection with deepstream with landmarks DeepStream SDK	17	4023	October 12, 2021
Deepstream Nvtracker, bounding boxes issues DeepStream SDK	21	5159	October 12, 2021
Show result detection DeepStream SDK camera , gstreamer	8	734	September 18, 2021

Face Embedding Generation Model Accuracy

Related topics