Query on Multi-Camera Calibration Accuracy vs Unified ID Reassociation Stability in MV3DT

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Nvidia2000 series ADA GPU
• DeepStream Version: 8
• JetPack Version (valid for Jetson only)
• TensorRT Version: 10.15.1.29-1+cuda13.1
• NVIDIA GPU Driver Version (valid for GPU only): 580.126.09
• Issue Type( questions, new requirements, bugs):

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello NVIDIA Team,

We are currently evaluating Multi-View 3D Tracking (MV3DT) using DeepStream 8.0 and would appreciate guidance on an issue we are facing.

MV3DT Reference: https://docs.nvidia.com/metropolis/deepstream/8.0/text/DS_MV3DT.html
Calibration Tool Used (VSS Warehouse Compose 3.1.0): https://catalog.ngc.nvidia.com/orgs/nvidia/teams/vss-warehouse/resources/vss-warehouse-compose?version=3.1.0

Environment

  • Use case: Passenger tracking inside an lounge

  • Camera setup: 4 fixed overhead cameras with overlapping fields of view

  • Calibration completed using VSS Warehouse Compose output files for MV3DT

Calibration Observation

We performed calibration following the recommended workflow.

I have attached one layout from a four-camera setup where calibration completed successfully.

From the generated result from VSS using 4 cameras:

  • One camera appears to be positioned correctly

  • The other three cameras seem slightly offset from their real physical locations

  • Attached layout map image shows the camera locations from the calibration perspective, and I have highlighted in pink the actual camera positions.

We would like to understand whether this level of offset is acceptable for stable MV3DT identity association, or if it can significantly impact unified ID consistency.

Runtime Tracking Issue Observed

Using Projection_matrix_3x4 generated from above calibration in MV3DT, we observe the following:

  • Some passengers initially receive correct unified IDs across cameras

  • After some time, the same passenger may be assigned a new ID

  • IDs sometimes lose synchronization between cameras

  • In certain cases, IDs merge correctly again later

  • For many passengers walking through the lounge, unified identity consistency is not maintained for the full journey

  • For eg: Cam 00: for Person A → person 1 → person 2 → person 3
    Cam 01: For same Person A → person 1 → person 5 → person 3 (again synced)

  • Ideally even if it looses trackid its initial id should be retained after a while right?

Main Question

Would this behavior more likely indicate:

  1. Calibration accuracy issue (camera placement / projection misalignment)

  2. MV3DT reassociation / ID correction behavior

  3. Tracker tuning parameter issue

  4. Camera overlap / transition timing issue

Additional Clarification

For MV3DT, how precise should calibration be for reliable unified IDs?

For example:

  • Is slight camera offset generally acceptable?

  • Or should generated camera positions closely match the real installed locations?

Goal

We are trying to determine whether our next focus should be:

  • Recalibrating again in VSS Warehouse Compose

  • Fine-tuning MV3DT tracker / reassociation settings

  • Improving camera overlap coverage

Any recommendations or best practices would be greatly appreciated.

Please upgrade to DeepStream 9.0 as we always improve tracker during release. Camera calibration is critical for MV3DT as the target association between camera will depend on the global location. The current calibration methodology performs best when input videos are “linear,” meaning they exhibit no lens distortion. While the tool can handle minor distortion, optimal results are achieved when lens distortion is zero. Does your camera include lens distortion(fisheye camera)? Can you share some video to us to check the issue? Please refer DeepStream AutoMagicCalib (AMC) for camera calibration for MV3DT: AutoMagicCalib — DeepStream documentation

Hi Team,

Thanks for the update.

I tried running with Deepstream 9 the Auto magic calib tool and VSS Auto Magic calib tool.

More or less the calibration was still the same as I have attached in previous screenshot.

The cameras appear slightly offset from their real physical locations.

labuser02@labuser02:~$ docker exec auto-magic-calib-ms-1 cat /opt/nvidia/deepstream/deepstream/version
Version: 9.0.0
DATE: Mon Mar 2 19:07:41 UTC 2026
labuser02@labuser02:~$ docker exec vss-auto-calibration cat /opt/nvidia/deepstream/deepstream/version
Version: 9.0.0
DATE: Thu Jan 22 22:35:20 UTC 2026

Kindly can you guide us on this, our basic requirement is to get same unified ids consistently across all cameras.

And even person moves from one point to other still ids should not switch ideally it should remain same for same person unless he is moving out from all camera views.

DeepStream AutoMagicCalib (AMC) supports both a geometry-based approach (AMC) using object trajectories and geometric relationships, and a model-based approach (VGGT) that leverages learned models for higher accuracy and robustness. Which method are you using? Cross camera tracking based on global location of the object. Camera lens distortion will impact the accuracy. Here is the guidelines to achieve optimal calibration results: AutoMagicCalib — DeepStream documentation

Hi Kesong,

I tried both AMC and VGGT approach.

With AMC, I am getting the results which I have shared earlier.

The main issue I encountered with VGGT is that calibration fails with an out-of-memory error as soon as I add more than two cameras. Please see the relevant error below:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 7.62 GiB of which 15.19 MiB is free. Including non-PyTorch memory, this process has 7.49 GiB memory in use.
Please find full error log below,

File “/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py”, line 2935, in layer_norm return torch.layer_norm( torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 7.62 GiB of which 15.19 MiB is free. Including non-PyTorch memory, this process has 7.49 GiB memory in use. Of the allocated memory 7.04 GiB is allocated by PyTorch, and 330.51 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (  ) Running command: bash launch_VggtCalib.sh -a ../server/projects/project_20260420_031321_3785/output -l ../server/projects/project_20260420_031321_3785/manual_adjustment/layout.png -m local && bash launch_ConvertToMv3dt.sh -i ../server/projects/project_20260420_031321_3785/output/vggt_results -o ../server/projects/project_20260420_031321_3785/output/vggt_results/mv3dt_output -m local

For VGGT, I am using the vggt_1B_commercial.pt model.

Due to this limitation, I stopped using the VGGT model for multi-camera calibration scenarios.

However, when VGGT is run with two cameras only, it provides kind of more precise camera position on the map compared to AMC it is still not accurate but much better than “AMC only calibration”

Kindly advise.

Suggest to find one GPU with bigger memory to have a try with model-based approach (VGGT). As previous comments, camera calibration is critical for target association across cameras. Can you share your test videos? You can share the videos in private message if you don’t want to share the video in public. Not sure if targets are close to each other and show similar motion patterns in your test video.

Please make sure you use MV3DT in DS 9.0, as there was a substantial improvement in accuracy. What detector you are using, for AMC and MV3DT to work best, only the standing persons should be detected and tracked. Could you share the detector outputs (i.e., bboxes) overlaid on the videos? You can share the videos in private message if you don’t want to share the video in public.

Hello Kesong,

Kindly can you share your official email-id.

I will share the videos to you personally.

Thanks.