I have been experimenting with the new Single-View 3D Tracking feature introduced in Deepstream 6.4. However, I’ve not been able to make it work. I’ve followed the instructions here as closely as I could, but the result is no detections at all (I’m using Peoplenet + NvDCF tracker and have verified that there were plenty of detections without using SV3DT). I’ve verified that my projection matrices are correct. The only thing I can think of that is different in my setup is that I’m using units of meters, instead of centimeters. Therefore, I also tried replacing ‘height=250’ and ‘radius: 30.0’ with 2.5 and 0.3, respectively, in the ‘modelInfo’ section of the camInfo.yml file. This also does not yield a different result.
I find it difficult to find the issue because there is not a lot of information available yet about how SV3DT should be used. Is there any kind of additional resource or tutorial available currently or in the future? If so, It would be great to hear when this becomes available.
Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) • DeepStream Version • JetPack Version (valid for Jetson only) • TensorRT Version • NVIDIA GPU Driver Version (valid for GPU only) • Issue Type( questions, new requirements, bugs) • How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) • Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
Here is the NvDCF config file I used. It is identical to the sample file config_tracker_NvDCF_perf.yml, expect for the ‘StateEstimator’ and ‘ObjectModelProjection’ section. I had to set StateEstimator.stateEstimatorType to 3 instead of 1, otherwise a segfault occurred.
BaseConfig:
minDetectorConfidence: 0.0430 # If the confidence of a detector bbox is lower than this, then it won't be considered for tracking
TargetManagement:
enableBboxUnClipping: 1 # In case the bbox is likely to be clipped by image border, unclip bbox
preserveStreamUpdateOrder: 0 # When assigning new target ids, preserve input streams' order to keep target ids in a deterministic order over multuple runs
maxTargetsPerStream: 150 # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity
# [Creation & Termination Policy]
minIouDiff4NewTarget: 0.7418 # If the IOU between the newly detected object and any of the existing targets is higher than this threshold, this newly detected object will be discarded.
minTrackerConfidence: 0.4009 # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]
probationAge: 2 # If the target's age exceeds this, the target will be considered to be valid.
maxShadowTrackingAge: 51 # Max length of shadow tracking. If the shadowTrackingAge exceeds this limit, the tracker will be terminated.
earlyTerminationAge: 1 # If the shadowTrackingAge reaches this threshold while in TENTATIVE period, the target will be terminated prematurely.
TrajectoryManagement:
useUniqueID: 0 # Use 64-bit long Unique ID when assignining tracker ID.
DataAssociator:
dataAssociatorType: 0 # the type of data associator among { DEFAULT= 0 }
associationMatcherType: 1 # the type of matching algorithm among { GREEDY=0, CASCADED=1 }
checkClassMatch: 1 # If checked, only the same-class objects are associated with each other. Default: true
# [Association Metric: Thresholds for valid candidates]
minMatchingScore4Overall: 0.4290 # Min total score
minMatchingScore4SizeSimilarity: 0.3627 # Min bbox size similarity score
minMatchingScore4Iou: 0.2575 # Min IOU score
minMatchingScore4VisualSimilarity: 0.5356 # Min visual similarity score
# [Association Metric: Weights]
matchingScoreWeight4VisualSimilarity: 0.3370 # Weight for the visual similarity (in terms of correlation response ratio)
matchingScoreWeight4SizeSimilarity: 0.4354 # Weight for the Size-similarity score
matchingScoreWeight4Iou: 0.3656 # Weight for the IOU score
# [Association Metric: Tentative detections] only uses iou similarity for tentative detections
tentativeDetectorConfidence: 0.2008 # If a detection's confidence is lower than this but higher than minDetectorConfidence, then it's considered as a tentative detection
minMatchingScore4TentativeIou: 0.5296 # Min iou threshold to match targets and tentative detection
StateEstimator:
stateEstimatorType: 3 # the type of state estimator among { DUMMY=0, SIMPLE=1, REGULAR=2 }
# [Dynamics Modeling]
processNoiseVar4Loc: 6810.866 # Process noise variance for location
processNoiseVar4Vel: 1348.487 # Process noise variance for velocity
measurementNoiseVar4Detector: 100.000 # Measurement noise variance for detector's detection
measurementNoiseVar4Tracker: 293.323 # Measurement noise variance for tracker's localization
VisualTracker:
visualTrackerType: 1 # the type of visual tracker among { DUMMY=0, NvDCF=1 }
# [NvDCF: Feature Extraction]
useColorNames: 1 # Use ColorNames feature
useHog: 0 # Use Histogram-of-Oriented-Gradient (HOG) feature
featureImgSizeLevel: 2 # Size of a feature image. Valid range: {1, 2, 3, 4, 5}, from the smallest to the largest
featureFocusOffsetFactor_y: -0.2000 # The offset for the center of hanning window relative to the feature height. The center of hanning window would move by (featureFocusOffsetFactor_y*featureMatSize.height) in vertical direction
# [NvDCF: Correlation Filter]
filterLr: 0.0750 # learning rate for DCF filter in exponential moving average. Valid Range: [0.0, 1.0]
filterChannelWeightsLr: 0.1000 # learning rate for the channel weights among feature channels. Valid Range: [0.0, 1.0]
gaussianSigma: 0.7500 # Standard deviation for Gaussian for desired response when creating DCF filter [pixels]
ObjectModelProjection:
cameraModelFilepath:
- /tmp/tmp6vqpe0dt.yaml
- /tmp/tmpkdkhavtu.yaml
Sorry for lack of additional resources. We are preparing a blogpost about SV3DT, which would include more samples and further information. So, please stay tuned.
Regarding your camera file, have you checked the back-projection error from (1) a 3D point on the 3D ground plane to (2) a 2D point on the camera image plane, and then (3) to a 3D point on the same 3D ground plane? (1) and (3) are supposed to be very close if your 3x4 projection matrix is correct.
I’m not sure if I understand your question. I have verified the projection matrix by multiplying a 3D point with the projection matrix, and verifying that the resulting point reflects the same point in the image plane.
You can back-project the 2D point in the image plane to a ray in 3D. If the 3D point lies on the ground plane, you can get where the ray intersects with the ground plane. If this back-projected point matches with your original 3D point, then your 3x4 matrix is verified. Basically, I was suggesting to check if you can get back to the original 3D point if you do projection and then back-projection.
Below is what we can provide for now as some more details. To make the transition easy for users who are familiar with OpenCV, we use a similar approach like below:
The 3x4 Camera Projection Matrix is also called as simply the camera matrix, which is a 3x4 matrix that converts a 3D world point to a 2D point on camera image plane based on a pinhole camera model. More detailed and general information about the camera matrix can be found in various sources that deal with the computer vision geometries and camera calibration, including OpenCV’s documentation on Camera Calibration (OpenCV: Camera Calibration and 3D Reconstruction)
For projectionMatrix_3x4 in a camera model file (e.g., camInfo-01.yml), the principal point (i.e., (Cx, Cy)) in the camera matrix is assumed to be at (0, 0) as image coordinates. But, the optical center (i.e., (Cx, Cy)) is located at the image center (i.e., (img_width/2, img_height/2)). Thus, to move the origin to the left-top of the camera image (i.e., pixel coordinates), SV3DT internally adds img_width/2, img_height/2) after the transformation using the camera matrix provided in projectionMatrix_3x4.