TAO dino trianing tensorboard image visualization not working

rishikesan · July 8, 2024, 6:39am

Please provide the following information when requesting support.

• Hardware (T4)
• Network Type : dino
• Training spec file(If have, please share here)

train:
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  optim:
    lr_backbone: 2e-05
    lr: 2e-4
    lr_steps: [11]
    momentum: 0.9
  num_epochs: 12
  precision: fp16
  checkpoint_interval: 1
  activation_checkpoint: True
  pretrained_model_path: /workspace/tao-experiments/dino/dino_model_epoch=003.pth
dataset:
  train_data_sources:
    - image_dir: /data/images/train/
      json_file: /data/train/annotations.json
  val_data_sources:
    - image_dir: /data/images/valid/
      json_file: /data/valid/annotations.json
  num_classes: 6
  batch_size: 8
  workers: 2
  augmentation:
    fixed_padding: True
model:
  backbone: fan_large
  train_backbone: False
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
After running the training , starting the tensorboard like below

tensorboard --logdir_spec=exp01:<result directory> --host 0.0.0.0 --port 8080

After running this i get the scalar graph in tensorboard , like loss and validation
but i couldn’t see any images with bounding boxes as it is passed to the model ,

I saw there is a setting we can add in spec file which is , as it is added to spec file

train:
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  optim:
    lr_backbone: 2e-05
    lr: 2e-4
    lr_steps: [11]
    momentum: 0.9
  num_epochs: 12
  precision: fp16
  checkpoint_interval: 1
  activation_checkpoint: True
  pretrained_model_path: /workspace/tao-experiments/dino/dino_model_epoch=003.pth
  visualizer{
    enabled: true
  }
dataset:
  train_data_sources:
    - image_dir: /data/images/train/
      json_file: /data/train/annotations.json
  val_data_sources:
    - image_dir: /data/images/valid/
      json_file: /data/valid/annotations.json
  num_classes: 6
  batch_size: 8
  workers: 2
  augmentation:
    fixed_padding: True
model:
  backbone: fan_large
  train_backbone: False
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048

the `visualizer ’ config , but this option is not listed in the Dino training spec file doc and when i run with this configuration i get the error as well

Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 105, in run
    cfg = self.compose_config(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 594, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 141, in load_configuration
    return self._load_configuration_impl(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 235, in _load_configuration_impl
    self._process_config_searchpath(config_name, parsed_overrides, caching_repo)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 158, in _process_config_searchpath
    loaded = repo.load_config(config_path=config_name)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_repository.py", line 349, in load_config
    ret = self.delegate.load_config(config_path=config_path)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_repository.py", line 92, in load_config
    ret = source.load_config(config_path=config_path)
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/core_plugins/file_config_source.py", line 31, in load_config
    cfg = OmegaConf.load(f)
  File "/usr/local/lib/python3.10/dist-packages/omegaconf/omegaconf.py", line 192, in load
    obj = yaml.load(file_, Loader=get_yaml_loader())
  File "/usr/local/lib/python3.10/dist-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
  File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/local/lib/python3.10/dist-packages/yaml/composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
  File "/usr/local/lib/python3.10/dist-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/usr/local/lib/python3.10/dist-packages/yaml/parser.py", line 428, in parse_block_mapping_key
    if self.check_token(KeyToken):
  File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 115, in check_token
    while self.need_more_tokens():
  File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 152, in need_more_tokens
    self.stale_possible_simple_keys()
  File "/usr/local/lib/python3.10/dist-packages/yaml/scanner.py", line 291, in stale_possible_simple_keys
    raise ScannerError("while scanning a simple key", key.mark,
yaml.scanner.ScannerError: while scanning a simple key
  in "/specs/train.yaml", line 15, column 3
could not find expected ':'
  in "/specs/train.yaml", line 16, column 12
Execution status: FAIL
2024-07-08 06:23:50,828 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Can you please advice on how to get the image visualization in tensorboard

Morganh · July 8, 2024, 9:47am

This setting is not for DINO. Refer to https://docs.nvidia.com/tao/tao-toolkit/text/tensorboard_visualization.html.
This is a feature request for DINO.

rishikesan · July 8, 2024, 10:23am

Then do we have any other way that we can visualize the images in tensorboard for Dino training

Morganh · July 14, 2024, 2:50pm

Currently the DINO code does not support it. You may refer to Visualizing Models, Data, and Training with TensorBoard — PyTorch Tutorials 2.3.0+cu121 documentation to try.

yingliu · August 9, 2024, 6:25am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · August 23, 2024, 6:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tensorboard is not working while running Dino training TAO Toolkit	8	933	June 25, 2024
TAO Dino training pipeline TAO Toolkit	5	122	August 27, 2024
Tensorboard for visualize training of classification_pyt TAO Toolkit jetson	15	206	December 24, 2024
Dino training is not successful TAO Toolkit cuda , tao	9	48	January 6, 2026
Tensorboard for MaskRCNN TAO model TAO Toolkit	3	371	June 15, 2023
Issue about visualizing Tao Toolkit AutoML training with Tensorboard TAO Toolkit tao	3	339	July 24, 2023
Cannot run Dino with tao-5.3.0 TAO Toolkit	7	504	May 17, 2024
Dino inference images do not have bounding boxes TAO Toolkit jetson	9	192	September 29, 2024
Tensorboard visualisation options for classification in Tao 3.22.05 TAO Toolkit	6	436	July 5, 2022
Tao toolkit version5 is getting error when comes to training part TAO Toolkit	45	2217	August 22, 2023

TAO dino trianing tensorboard image visualization not working

Related topics