• DGX A100 dGPU
• DeepStream 8.0
Hello, I have integrated a Single-View 3D with its corresponding configuration file into a deepstream pipeline made with Python.
The tracker config file is this:
%YAML:1.0
BaseConfig:
minDetectorConfidence: 0.1894 # If the confidence of a detector bbox is lower than this, then it won’t be considered for tracking
TargetManagement:
enableBboxUnClipping: 1 # In case the bbox is likely to be clipped by image border, unclip bbox
preserveStreamUpdateOrder: 0 # When assigning new target ids, preserve input streams’ order to keep target ids in a deterministic order over multuple runs
maxTargetsPerStream: 150 # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity
# [Creation & Termination Policy]
minIouDiff4NewTarget: 0.3686 # If the IOU between the newly detected object and any of the existing targets is higher than this threshold, this newly detected object will be discarded.
minTrackerConfidence: 0.1513 # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]
probationAge: 2 # If the target’s age exceeds this, the target will be considered to be valid.
maxShadowTrackingAge: 42 # Max length of shadow tracking. If the shadowTrackingAge exceeds this limit, the tracker will be terminated.
earlyTerminationAge: 1 # If the shadowTrackingAge reaches this threshold while in TENTATIVE period, the target will be terminated prematurely.
# dump tracklets in txt file
outputTerminatedTracks: 0 # save terminated tracklets
terminatedTrackFilename: track_dump_ # file name: "terminatedTrackFilename"0.txt, "terminatedTrackFilename"1_2.txt, …
TrajectoryManagement:
useUniqueID: 0 # Use 64-bit long Unique ID when assignining tracker ID. Default is [true]
enableReAssoc: 1 # Enable Re-Assoc
# [Re-Assoc Metric: Thresholds for valid candidates]
minMatchingScore4Overall: 0.6622 # min matching score for overall
minTrackletMatchingScore: 0.2940 # min tracklet similarity score for re-assoc
minMatchingScore4ReidSimilarity: 0.0771 # min reid similarity score for re-assoc
# [Re-Assoc Metric: Weights]
matchingScoreWeight4TrackletSimilarity: 0.7981 # weight for tracklet similarity score
matchingScoreWeight4ReidSimilarity: 0.3848 # weight for reid similarity score
# [Re-Assoc: Motion-based]
minTrajectoryLength4Projection: 34 # min trajectory length required to make projected trajectory
prepLength4TrajectoryProjection: 58 # the length of the trajectory during which the state estimator is updated to make projections
trajectoryProjectionLength: 33 # the length of the projected trajectory
maxAngle4TrackletMatching: 67 # max angle difference for tracklet matching [degree]
minSpeedSimilarity4TrackletMatching: 0.0574 # min speed similarity for tracklet matching
minBboxSizeSimilarity4TrackletMatching: 0.1013 # min bbox size similarity for tracklet matching
maxTrackletMatchingTimeSearchRange: 27 # the search space in time for max tracklet similarity
trajectoryProjectionProcessNoiseScale: 0.0100 # trajectory projector’s process noise scale w.r.t. state estimator
trajectoryProjectionMeasurementNoiseScale: 100 # trajectory projector’s measurement noise scale w.r.t. state estimator
trackletSpacialSearchRegionScale: 0.0100 # the search region scale for peer tracklet
# [Re-Assoc: Reid based. Reid model params are set in ReID section]
reidExtractionInterval: 8 # frame interval to extract reid features per target
DataAssociator:
dataAssociatorType: 0 # the type of data associator among { DEFAULT= 0 }
associationMatcherType: 1 # the type of matching algorithm among { GREEDY=0, CASCADED=1 }
checkClassMatch: 1 # If checked, only the same-class objects are associated with each other. Default: true
# [Association Metric: Thresholds for valid candidates]
minMatchingScore4Overall: 0.0222 # Min total score
minMatchingScore4SizeSimilarity: 0.3552 # Min bbox size similarity score
minMatchingScore4Iou: 0.0548 # Min IOU score
minMatchingScore4VisualSimilarity: 0.5043 # Min visual similarity score
# [Association Metric: Weights]
matchingScoreWeight4VisualSimilarity: 0.3951 # Weight for the visual similarity (in terms of correlation response ratio)
matchingScoreWeight4SizeSimilarity: 0.6003 # Weight for the Size-similarity score
matchingScoreWeight4Iou: 0.4033 # Weight for the IOU score
# [Association Metric: Tentative detections] only uses iou similarity for tentative detections
tentativeDetectorConfidence: 0.1024 # If a detection’s confidence is lower than this but higher than minDetectorConfidence, then it’s considered as a tentative detection
minMatchingScore4TentativeIou: 0.2852 # Min iou threshold to match targets and tentative detection
StateEstimator:
stateEstimatorType: 3 # the type of state estimator among { DUMMY=0, SIMPLE=1, REGULAR=2, 3D=3 }
# [Dynamics Modeling]
processNoiseVar4Loc: 6810.8668 # Process noise variance for bbox center
processNoiseVar4Vel: 1348.4874 # Process noise variance for velocity
measurementNoiseVar4Detector: 100.0000 # Measurement noise variance for detector’s detection
measurementNoiseVar4Tracker: 293.3238 # Measurement noise variance for tracker’s localization
ObjectModelProjection:
cameraModelFilepath: # camera calibration file for each stream
-
/app/config/camInfo/camInfo.yml
-
/app/config/camInfo/camInfo.yml
-
/app/config/camInfo/camInfo.yml
-
/app/config/camInfo/camInfo.yml
outputVisibility: 1 # output visibility by occlusion
outputFootLocation: 1 # output foot location estimated from 3D model
outputConvexHull: 1 # output convex hull for each object estimated from 3D cylinder model
VisualTracker:
visualTrackerType: 2 # the type of visual tracker among { DUMMY=0, NvDCF=1, NvDCF_VPI=2 }
vpiBackend4DcfTracker: 1 # the type of compute backend among {CUDA=1, PVA=2}
# [NvDCF: Feature Extraction]
useColorNames: 1 # Use ColorNames feature
useHog: 1 # Use Histogram-of-Oriented-Gradient (HOG) feature
featureImgSizeLevel: 3 # Size of a feature image. Valid range: {1, 2, 3, 4, 5}, from the smallest to the largest
featureFocusOffsetFactor_y: -0.1054 # The offset for the center of hanning window relative to the feature height. The center of hanning window would move by (featureFocusOffsetFactor_y*featureMatSize.height) in vertical direction
# [NvDCF: Correlation Filter]
filterLr: 0.0767 # learning rate for DCF filter in exponential moving average. Valid Range: [0.0, 1.0]
filterChannelWeightsLr: 0.0339 # learning rate for the channel weights among feature channels. Valid Range: [0.0, 1.0]
gaussianSigma: 0.5687 # Standard deviation for Gaussian for desired response when creating DCF filter [pixels]
ReID:
reidType: 2 # The type of reid among { DUMMY=0, NvDEEPSORT=1, Reid based reassoc=2, both NvDEEPSORT and reid based reassoc=3}
# [Reid Network Info]
batchSize: 100 # Batch size of reid network
workspaceSize: 1000 # Workspace size to be used by reid engine, in MB
reidFeatureSize: 256 # Size of reid feature
reidHistorySize: 100 # Max number of reid features kept for one object
inferDims: [3, 256, 128] # Reid network input dimension CHW or HWC based on inputOrder
networkMode: 1 # Reid network inference precision mode among {fp32=0, fp16=1, int8=2 }
# [Input Preprocessing]
inputOrder: 0 # Reid network input order among { NCHW=0, NHWC=1 }. Batch will be converted to the specified order before reid input.
colorFormat: 0 # Reid network input color format among {RGB=0, BGR=1 }. Batch will be converted to the specified color before reid input.
offsets: [123.6750, 116.2800, 103.5300] # Array of values to be subtracted from each input channel, with length equal to number of channels
netScaleFactor: 0.01735207 # Scaling factor for reid network input after substracting offsets
keepAspc: 1 # Whether to keep aspc ratio when resizing input objects for reid
useVPICropScaler: 1 # Use VPI backend crop and scaler
# [Output Postprocessing]
addFeatureNormalization: 1 # If reid feature is not normalized in network, adding normalization on output so each reid feature has l2 norm equal to 1
minVisibility4GalleryUpdate: 0.6 # Add ReID embedding to the gallery only if the visibility is not lower than this
# [Paths and Names]
tltEncodedModel: “/models/reidentificationnet/resnet50_market1501.etlt” # NVIDIA TAO model path
tltModelKey: “nvidia_tao” # NVIDIA TAO model key
modelEngineFile: “/models/deepstream/8.0/A100/gstreamer/single-view-3d-tracker/resnet50_market1501.etlt_b100_gpu0_fp16.plan” # Engine file path
PoseEstimator:
poseEstimatorType: 1 # Type of pose estimator used
useVPICropScaler: 1 # Use VPI backend for cropping and scaling
batchSize: 1 # Batch size for pose estimation
workspaceSize: 1000 # Workspace size in MB for the pose estimator engine
inferDims: [3, 256, 192] # Input dimensions for the pose estimator network (C, H, W)
networkMode: 1 # Inference precision mode (fp32=0, fp16=1, int8=2)
inputOrder: 0 # Input order for the network (NCHW=0, NHWC=1)
colorFormat: 0 # Input color format (RGB=0, BGR=1)
offsets: [123.6750, 116.2800, 103.5300] # Channel-wise mean subtraction values
netScaleFactor: 0.00392156 # Scaling factor for input normalization
onnxFile: “/models/bodypose3dnet/bodypose3dnet_accuracy.onnx” # Path to the ONNX model file
modelEngineFile: “/models/deepstream/8.0/A100/gstreamer/single-view-3d-tracker/bodypose3dnet_accuracy.onnx_b1_gpu0_fp16.plan” # Path to the engine file
poseInferenceInterval: -1 # Pose inference frame interval. -1 means only for the first frame of each target and use it to determine the target height.
The models compile correctly and the pipeline runs; I can retrieve the tracker’s bbox parameters from NvObjectMeta, as well as its confidence, etc. (as with any tracker).
I’ve attached a base image (only a couple of persons due to company policy) of what I see. I’m using the default NVIDIA camInfo.yaml file for testing purposes, which is why the 3D BBs appear tilted.
The problem is that I have absolutely no idea how to retrieve the metadata from this tracker, for example, visibility, food world position (x and y), foot image position (x and y), etc., using pyds.
Here in the official repo:
There’s only one example that directly executes deepstream-app.c (I think) and there’s absolutely nothing else. I went to the DS 8.0 container to see that code (since there’s no other way) and I can directly access the 3D Tracker metadata, whereas in Python you can’t.
I tried to generate a custom binding for pyds using the NvDs3DTracking documentation here: NVIDIA DeepStream SDK API Reference: nvdsmeta_schema.h Source File | NVIDIA Docs , but even then I couldn’t manage it. Generating a custom binding at this level is extremely difficult. I tried for a couple of days, but the difficulty is too high, and I haven’t achieved anything. The difficulty lies in the fact that there is very little support for cases where Python is used. With .cpp, there seems to be much more automatic support for accessing the metadata. Your kind assistance would be greatly appreciated, NVIDIA team.
There is absolutely nothing on the internet related to this situation, nor have I found anything similar in the forums.


