TAO Toolkit Sparse4D Transfer Learning Issue Report
High-confidence false positives from unmatched queries
Evidence video: video.mp4
1) Executive Summary
During transfer learning with nvcr.io/nvidia/tao/tao-toolkit:6.25.11-pyt, Sparse4D outputs many high-confidence detections that are not assigned stable tracking IDs and are not removed by TopK + score-threshold filtering.
Code inspection indicates a likely training bug: queries not matched to GT are assigned a background label (num_cls) but are excluded from classification loss in TAO’s custom focal-loss implementation.
As a result, a large portion of unmatched queries receive no negative classification supervision, which can lead to score inflation and many false positives at inference.
2) Scope and Source Artifacts
-
Target container image:
nvcr.io/nvidia/tao/tao-toolkit:6.25.11-pyt -
TAO Sparse4D source extracted from container path:
-
/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/sparse4d/... -
Upstream comparison source:
-
Horizon Robotics Sparse4D repository: HorizonRobotics/Sparse4D
-
Upstream
sparse4d_head.py: raw file -
Reference DETR implementation:
-
Reference mmdetection focal-loss behavior used by upstream Sparse4D:
3) TAO Code Evidence (Direct)
3.1 Unmatched queries are explicitly labeled as background (num_cls)
File: nvidia_tao_pytorch/cv/sparse4d/model/detection3d/target.py
TAO initializes classification targets for all predictions as num_cls, then overwrites only matched predictions with GT class labels:
output_cls_target = (
cls_target[0].new_ones([bs, num_pred], dtype=torch.long) * num_cls
)
...
output_cls_target[i, pred_idx] = cls_target[i][target_idx]
This means all unmatched queries remain with label num_cls (background/no-object semantics).
3.2 Regression loss is computed only for matched queries
File: nvidia_tao_pytorch/cv/sparse4d/model/criterion.py
Regression is masked by non-zero target boxes:
mask = torch.logical_not(torch.all(reg_target == 0, dim=-1))
...
reg_target = reg_target.flatten(end_dim=1)[mask]
reg = reg.flatten(end_dim=1)[mask]
So unmatched queries do not contribute regression loss (this is expected for DETR-like training).
3.3 TAO custom focal loss excludes background-labeled queries from classification loss
File: nvidia_tao_pytorch/cv/sparse4d/model/criterion.py (custom FocalLoss)
TAO filters valid samples with:
num_classes = pred.size(1)
valid_mask = (target >= 0) & (target < num_classes)
...
pred = pred[valid_mask]
valid_target = target[valid_mask]
Because unmatched queries are encoded as target == num_classes, they fail target < num_classes and are dropped from classification loss entirely.
3.4 Inference keeps top scores directly from sigmoid logits
File: nvidia_tao_pytorch/cv/sparse4d/model/detection3d/decoder.py
Inference applies sigmoid + TopK + threshold:
cls_scores = cls_scores[output_idx].sigmoid()
cls_scores, indices = cls_scores.flatten(start_dim=1).topk(self.num_output, dim=1, sorted=self.sort_results)
...
mask = cls_scores >= self.score_threshold
If unmatched queries are not negatively supervised during training, they can keep inflated scores and survive this filter.
4) Upstream Sparse4D / DETR Comparison
4.1 Upstream Sparse4D uses same target encoding (num_cls for unmatched) but different loss backend
In upstream Sparse4D, unmatched target encoding is also num_cls (target.py), and loss_cls is built via mmdetection:
self.loss_cls = build(loss_cls, LOSSES)in upstream head.
4.2 mmdetection focal loss maps num_classes label to all-zero one-hot (negative supervision retained)
Reference mmdetection code:
target = F.one_hot(target, num_classes=num_classes + 1)
target = target[:, :num_classes]
For target == num_classes, one-hot becomes all zeros over foreground classes, which still contributes BCE/focal negative loss on all class logits.
4.3 DETR also supervises unmatched queries as no-object
In DETR:
target_classes = torch.full(src_logits.shape[:2], self.num_classes, ...)
target_classes[idx] = target_classes_o
loss_ce = F.cross_entropy(..., target_classes, self.empty_weight)
Unmatched queries are explicitly supervised as no-object class, preventing uncontrolled high confidence.
5) Root Cause Hypothesis
The TAO Sparse4D implementation in 6.25.11-pyt appears to combine:
-
Background target encoding: unmatched queries labeled as
num_cls, and -
Custom focal-loss filtering: keeps only
target < num_classes.
This combination removes classification loss for unmatched queries, unlike upstream Sparse4D + mmdetection behavior and DETR-style no-object supervision.
6) Practical Impact
-
High confidence is not sufficiently penalized on unmatched queries.
-
TopK + threshold post-filtering cannot suppress enough false detections.
-
Tracking receives many spurious detections, causing unstable IDs and large false-positive volumes.
7) Reproducibility Note (Source Extraction)
Sparse4D TAO source was extracted from the container for inspection using standard Docker copy flow (docker create + docker cp) from:
/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/sparse4d/
8) Key File Locations to Inspect in TAO Image
-
nvidia_tao_pytorch/cv/sparse4d/model/detection3d/target.py -
nvidia_tao_pytorch/cv/sparse4d/model/criterion.py -
nvidia_tao_pytorch/cv/sparse4d/model/detection3d/decoder.py -
nvidia_tao_pytorch/cv/sparse4d/model/sparse4d_pl_model.py
Because it cannot be completely ruled out that this issue may be caused by configuration settings, we will attach the experiment.yaml used for the transfer learning shown in the video at the beginning. Please review its contents.
experiment_yaml.txt (8.5 KB)