TAO Toolkit Sparse4D Transfer Learning Issue Report, High-confidence false positives from unmatched queries

TAO Toolkit Sparse4D Transfer Learning Issue Report

High-confidence false positives from unmatched queries

Evidence video: video.mp4

1) Executive Summary

During transfer learning with nvcr.io/nvidia/tao/tao-toolkit:6.25.11-pyt, Sparse4D outputs many high-confidence detections that are not assigned stable tracking IDs and are not removed by TopK + score-threshold filtering.

Code inspection indicates a likely training bug: queries not matched to GT are assigned a background label (num_cls) but are excluded from classification loss in TAO’s custom focal-loss implementation.

As a result, a large portion of unmatched queries receive no negative classification supervision, which can lead to score inflation and many false positives at inference.


2) Scope and Source Artifacts

  • Target container image: nvcr.io/nvidia/tao/tao-toolkit:6.25.11-pyt

  • TAO Sparse4D source extracted from container path:

  • /usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/sparse4d/...

  • Upstream comparison source:

  • Horizon Robotics Sparse4D repository: HorizonRobotics/Sparse4D

  • Upstream sparse4d_head.py: raw file

  • Reference DETR implementation:

  • facebookresearch/detr/models/detr.py

  • Reference mmdetection focal-loss behavior used by upstream Sparse4D:

  • mmdet focal_loss.py (v2.28.2)


3) TAO Code Evidence (Direct)

3.1 Unmatched queries are explicitly labeled as background (num_cls)

File: nvidia_tao_pytorch/cv/sparse4d/model/detection3d/target.py

TAO initializes classification targets for all predictions as num_cls, then overwrites only matched predictions with GT class labels:


output_cls_target = (

cls_target[0].new_ones([bs, num_pred], dtype=torch.long) * num_cls

)

...

output_cls_target[i, pred_idx] = cls_target[i][target_idx]

This means all unmatched queries remain with label num_cls (background/no-object semantics).

3.2 Regression loss is computed only for matched queries

File: nvidia_tao_pytorch/cv/sparse4d/model/criterion.py

Regression is masked by non-zero target boxes:


mask = torch.logical_not(torch.all(reg_target == 0, dim=-1))

...

reg_target = reg_target.flatten(end_dim=1)[mask]

reg = reg.flatten(end_dim=1)[mask]

So unmatched queries do not contribute regression loss (this is expected for DETR-like training).

3.3 TAO custom focal loss excludes background-labeled queries from classification loss

File: nvidia_tao_pytorch/cv/sparse4d/model/criterion.py (custom FocalLoss)

TAO filters valid samples with:


num_classes = pred.size(1)

valid_mask = (target >= 0) & (target < num_classes)

...

pred = pred[valid_mask]

valid_target = target[valid_mask]

Because unmatched queries are encoded as target == num_classes, they fail target < num_classes and are dropped from classification loss entirely.

3.4 Inference keeps top scores directly from sigmoid logits

File: nvidia_tao_pytorch/cv/sparse4d/model/detection3d/decoder.py

Inference applies sigmoid + TopK + threshold:


cls_scores = cls_scores[output_idx].sigmoid()

cls_scores, indices = cls_scores.flatten(start_dim=1).topk(self.num_output, dim=1, sorted=self.sort_results)

...

mask = cls_scores >= self.score_threshold

If unmatched queries are not negatively supervised during training, they can keep inflated scores and survive this filter.


4) Upstream Sparse4D / DETR Comparison

4.1 Upstream Sparse4D uses same target encoding (num_cls for unmatched) but different loss backend

In upstream Sparse4D, unmatched target encoding is also num_cls (target.py), and loss_cls is built via mmdetection:

  • self.loss_cls = build(loss_cls, LOSSES) in upstream head.

4.2 mmdetection focal loss maps num_classes label to all-zero one-hot (negative supervision retained)

Reference mmdetection code:


target = F.one_hot(target, num_classes=num_classes + 1)

target = target[:, :num_classes]

For target == num_classes, one-hot becomes all zeros over foreground classes, which still contributes BCE/focal negative loss on all class logits.

4.3 DETR also supervises unmatched queries as no-object

In DETR:


target_classes = torch.full(src_logits.shape[:2], self.num_classes, ...)

target_classes[idx] = target_classes_o

loss_ce = F.cross_entropy(..., target_classes, self.empty_weight)

Unmatched queries are explicitly supervised as no-object class, preventing uncontrolled high confidence.


5) Root Cause Hypothesis

The TAO Sparse4D implementation in 6.25.11-pyt appears to combine:

  1. Background target encoding: unmatched queries labeled as num_cls, and

  2. Custom focal-loss filtering: keeps only target < num_classes.

This combination removes classification loss for unmatched queries, unlike upstream Sparse4D + mmdetection behavior and DETR-style no-object supervision.


6) Practical Impact

  • High confidence is not sufficiently penalized on unmatched queries.

  • TopK + threshold post-filtering cannot suppress enough false detections.

  • Tracking receives many spurious detections, causing unstable IDs and large false-positive volumes.


7) Reproducibility Note (Source Extraction)

Sparse4D TAO source was extracted from the container for inspection using standard Docker copy flow (docker create + docker cp) from:

/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/sparse4d/


8) Key File Locations to Inspect in TAO Image

  • nvidia_tao_pytorch/cv/sparse4d/model/detection3d/target.py

  • nvidia_tao_pytorch/cv/sparse4d/model/criterion.py

  • nvidia_tao_pytorch/cv/sparse4d/model/detection3d/decoder.py

  • nvidia_tao_pytorch/cv/sparse4d/model/sparse4d_pl_model.py


Because it cannot be completely ruled out that this issue may be caused by configuration settings, we will attach the experiment.yaml used for the transfer learning shown in the video at the beginning. Please review its contents.
experiment_yaml.txt (8.5 KB)

Thanks for the detailed report. From the spec you shared, there is not an obvious configuration mistake that would directly explain the reported high-confidence false positives by itself. May I know if you ever run the official notebook along with the dataset mentioned in notebook? It is useful to compare against the official Sparse4D notebook / baseline workflow as an A/B reference. If the same symptom can also be observed there, it would strengthen the case that this is an implementation-level issue rather than something specific to the custom transfer-learning setup. If the notebook baseline does not reproduce it, it would help narrow down the triggering conditions.

More, please try below change for nvidia_tao_pytorch/cv/sparse4d/model/criterion.py based on your findings.

-        valid_mask = (target >= 0) & (target < num_classes)  # ignore negative/out-of-range
+        target = target.long()
+        valid_mask = target >= 0  # keep foreground and background; ignore only negative labels
         if not valid_mask.any():
-            # No valid samples, return zero loss
-            return pred.new_tensor(0.0)
+            return pred.sum() * 0

         pred = pred[valid_mask]
         valid_target = target[valid_mask]
         if weight is not None:
             weight = weight[valid_mask]

-        # Now safe to do one_hot
-        one_hot_target = F.one_hot(valid_target, num_classes=num_classes).float()
+        one_hot_target = F.one_hot(
+            valid_target.clamp(max=num_classes),
+            num_classes=num_classes + 1
+        )[:, :num_classes].to(dtype=pred.dtype)


Thank you for your response.
I have previously performed transfer learning using the official notebook and the official dataset, with the default training settings.

I will conduct the A/B test you suggested and compare the results.

Thank you.