After successfully training a mask2former model using the tutorial notebook, I’m not able to visualize the inference results of the pytorch model. If I convert this model to tensorrt, then I can produce overlay images for visualization. Is there a particular parameter I need to pass in the spec file to be able to generate them?
You can refer to the " 6.1. Visualize the result" section of tao_tutorials/notebooks/tao_launcher_starter_kit/mask2former/mask2former_inst.ipynb at main · NVIDIA/tao_tutorials · GitHub. It will visualize the inference result of pytorch model.
This notebook is indeed what I’m using. However, I still cannot see the results. This is essentially what I see:
Here is the output of the inference block:
2025-06-24 07:50:22,334 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-06-24 07:50:22,486 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2025-06-24 07:50:22,707 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ehtiak2/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2025-06-24 07:50:22,707 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2025-06-24 12:50:31,981 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
sys:1: UserWarning:
'spec_inst.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/hydra/hydra_runner.py:107: UserWarning:
'spec_inst.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
_run_hydra(
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
Inference results will be saved at: /results_inst/inference
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/loggers/api_logging.py:236: UserWarning: Log file already exists at /results_inst/inference/status.json
rank_zero_warn(
/usr/local/lib/python3.10/dist-packages/torch/functional.py:512: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3553.)
GPU available: True (cuda), used: True # type: ignore[attr-defined]
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
The following callbacks returned in `LightningModule.configure_callbacks` will override existing callbacks passed to Trainer: ModelCheckpoint
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'predict_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=27` in the `DataLoader` to improve performance.
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/data.py:104: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
[2025-06-24 12:50:42,339 - TAO Toolkit - root - INFO] Sending telemetry data.
[2025-06-24 12:50:42,339 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2025-06-24 12:50:42,339 - TAO Toolkit - root - INFO] Sending {'version': '5.5.0', 'action': 'inference', 'network': 'mask2former', 'gpu': ['NVIDIA-RTX-3500-Ada-Generation-Laptop-GPU'], 'success': True, 'time_lapsed': 10} to https://api.tao.ngc.nvidia.com.
[2025-06-24 12:50:42,891 - TAO Toolkit - root - WARNING] Telemetry data couldn't be sent, but the command ran successfully.
[2025-06-24 12:50:42,891 - TAO Toolkit - root - WARNING] [Error]: HTTPSConnectionPool(host='api.tao.ngc.nvidia.com', port=443): Max retries exceeded with url: /api/v1/metrics (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
[2025-06-24 12:50:42,892 - TAO Toolkit - root - INFO] Execution status: PASS
2025-06-24 07:50:44,347 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
Additionally, all I see in the “inference” folder are a status.json and a lightning_logs folder. Here is the content of the status.json:
{"date": "3/10/2025", "time": "17:16:54", "status": "STARTED", "verbosity": "INFO", "message": "Starting Mask2Former inference."}
{"date": "3/10/2025", "time": "17:16:54", "status": "RUNNING", "verbosity": "INFO", "message": "Loading checkpoint: /results_inst/train/mask2former_model.pth"}
{"date": "3/10/2025", "time": "17:16:55", "status": "RUNNING", "verbosity": "INFO", "message": "Inference finished successfully."}
{"date": "3/11/2025", "time": "15:42:39", "status": "STARTED", "verbosity": "INFO", "message": "Starting Mask2Former inference."}
{"date": "3/11/2025", "time": "15:42:39", "status": "RUNNING", "verbosity": "INFO", "message": "Loading checkpoint: /results_inst/train/mask2former_model.pth"}
{"date": "3/11/2025", "time": "15:42:41", "status": "RUNNING", "verbosity": "INFO", "message": "Inference finished successfully."}
{"date": "6/20/2025", "time": "13:2:10", "status": "STARTED", "verbosity": "INFO", "message": "Starting Mask2Former inference."}
{"date": "6/20/2025", "time": "13:2:10", "status": "RUNNING", "verbosity": "INFO", "message": "Loading checkpoint: /results_inst/train/mask2former_model.pth"}
{"date": "6/20/2025", "time": "13:2:12", "status": "RUNNING", "verbosity": "INFO", "message": "Inference finished successfully."}
{"date": "6/24/2025", "time": "12:50:38", "status": "STARTED", "verbosity": "INFO", "message": "Starting Mask2Former inference."}
{"date": "6/24/2025", "time": "12:50:38", "status": "RUNNING", "verbosity": "INFO", "message": "Loading checkpoint: /results_inst/train/mask2former_model.pth"}
{"date": "6/24/2025", "time": "12:50:41", "status": "RUNNING", "verbosity": "INFO", "message": "Inference finished successfully."}
Can you modify below to visualize one image?
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.
More, you can run inside the docker to check if any inference result is generated.
The inference directory is empty. Only contains log files, so changing the parameters above won’t make a difference.
It is very strange to see this behavior. I suggest you to try to run inside the docker as well.
$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt /bin/bash
Then, run commands like
# mask2former inference xxx
More, please set below in the spec yaml.
type: ‘coco’
**There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks**
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.
The root cause is due to tao_pytorch_backend/nvidia_tao_pytorch/cv/mask2former/dataloader/datasets.py at main · NVIDIA/tao_pytorch_backend · GitHub.
If test images are .png format, please change it to
self.img_list = sorted(glob.glob(img_dir + '/*.jpg') + glob.glob(img_dir + '/*.png'))
in /usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/mask2former/dataloader/datasets.py
