Tao toolkit observations

DS 7.0
dGPU or GeForce RTX 3060 in a laptop
TAO toolkit.

I’m looking for some hints. I was trying to follow this tutuorial TAO Toolkit Quick Start Guide - NVIDIA Docs, but I must have made some mistakes.

I first tried to run the getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/detectnet_v2/detectnet_v2.ipynb. To no avail. In the end I dropped that and followed the Jupyter script.

These are my steps:

  1. Create a folder tao-experiments on my host system. Created a data subdir underneath and copied the separately downloaded model to there (Step 2 D).
  2. I setup all requirements, built a conda VM, fired it up and setup my environment:
export NUM_GPUS=1
export USER_EXPERIMENT_DIR=/workspace/tao-experiments
export LOCAL_EXPERIMENT_DIR=/home/ubuntu/tao-experiments
export DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data
export LOCAL_PROJECT_DIR=/home/ubuntu/tao-experiments
export LOCAL_DATA_DIR=/home/ubuntu/tao-experiments/data
export VIRTUALENVWRAPPER_PYTHON=/home/ubuntu/anaconda3/envs/launcher/bin/python
export LOCAL_SPECS_DIR=/home/ubuntu/tao-experiments/detectnet_v2/specs
export SPECS_DIR=/workspace/tao-experiments/detectnet_v2/specs

My ~/.tao-mounts that time looked like so:

{
    "Mounts": [
        {
            "source": "/home/ubuntu/tao-experiments",
            "destination": "/workspace/tao-experiments"
        },
        {
            "source": "/home/ubuntu/tao-experiments/detectnet_v2/specs",
            "destination": "/workspace/tao-experiments/detectnet_v2/specs"
        }
    ],
    "DockerOptions":{
           "user": "1000:1000"
       }
}
  1. 2C failed immediately. A permission issue while attempting to create a directory. Which one not reported. I found a post here, which sugested to remove the “user”: “1000:1000” and it worked then.

So step 2C passed now:

tao model detectnet_v2 dataset_convert \
                  -d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
                  -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval \
                  -r $USER_EXPERIMENT_DIR/

and these commands showed useful results.

ls -rlt $LOCAL_DATA_DIR/tfrecords/kitti_trainval/
cat $LOCAL_SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt

I then trained the model with the kitti dataset for about 7 h successfully:

tao model detectnet_v2 train -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                        -n resnet18_detector \
                        --gpus $NUM_GPUS

and the suggested command showed up with something:

(launcher) ubuntu@simulator:~/tao-experiments$ ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights
total 43M
-rw-r--r-- 1 root root 43M May 26 02:39 resnet18_detector.hdf5
  1. Step 5: Evaluate the trained model:

Was ok:

tao model detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt\
                           -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.hdf5
  1. Step 6: Prune the trained model - worked
mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned
ls $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

// Result
resnet18_nopool_bn_detectnet_v2_pruned.hdf5


tao model detectnet_v2 prune \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.hdf5 \
                  -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.hdf5 \
                  -eq union \
                  -pth 0.0000052

Finally

ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/

// Result

total 34092
-rw-r--r-- 1 root root 34903192 May 26 07:34 resnet18_nopool_bn_detectnet_v2_pruned.hdf5

  1. Then I came to step 7 (Retrain the pruned model) and didn’t expect problems anymore, but there was one:
tao model detectnet_v2 train -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
                        -n resnet18_detector_pruned \
                        --gpus $NUM_GPUS

Out of the sudden this step was unable to find Pretrained model file not found: /workspace/tao-experiments/detectnet_v2/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.hdf5

The strange thing: This model file exists, but not where the tao app expects it. And if I’m not wrong it has been created during the execution of the Jupyter in step 6, Prune the trained model

mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

experiment_dir_pruned is clearly produced below $LOCAL_EXPERIMENT, so why it is resolves under $LOCAL_EXPERIMENT/detectnet_v2/experiment_dir_pruned?

.
├── data
├── detectnet_v2
├── experiment_dir_pruned
├── experiment_dir_retrain
├── experiment_dir_unpruned
└── status.json

In the end a altered ~/.tao_mounts.json and added a special mapping for this directory:

{
            "source": "/home/ubuntu/tao-experiments/experiment_dir_pruned",
            "destination": "/workspace/tao-experiments/detectnet_v2/experiment_dir_pruned"
}

Training passed then. But I don’t think this is correct. So what did I wrong?

  1. I was able to perform step 8 “Evaluate the retrained model” w/o problem
tao model detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
                           -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.hdf5
  1. And also step 9 “Visualize inferences” worked
mkdir -p $LOCAL_DATA_DIR/test_samples
cp $LOCAL_DATA_DIR/testing/image_2/00000* $LOCAL_DATA_DIR/test_samples
tao model detectnet_v2 inference -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt \
                            -r $USER_EXPERIMENT_DIR/tlt_infer_testing \
                            -i $DATA_DOWNLOAD_DIR/test_samples

But also this script complained:

AssertionError: Pretrained model not found at /workspace/tao-experiments/detectnet_v2/experiment_dir_retrain/weights/resnet18_detector_pruned.hdf5

Luckily tao model detectnet_v2 comes to help and reveals: The file is really not there:

root@e3d1eb437087:/workspace# ls /workspace/tao-experiments/detectnet_v2/experiment_dir_retrain/weights/resnet18_detector_pruned.hdf5
ls: cannot access '/workspace/tao-experiments/detectnet_v2/experiment_dir_retrain/weights/resnet18_detector_pruned.hdf5': No such file or directory

But it is here:

root@e3d1eb437087:/workspace# ls /workspace/tao-experiments/experiment_dir_retrain/weights/resnet18_detector_pruned.hdf5 
/workspace/tao-experiments/experiment_dir_retrain/weights/resnet18_detector_pruned.hdf5

So same thing: While earlier scripts have placed something in a directory, following scripts expect the results on step down the directory tree:

Applied the same mapping as before by adding this to ~/.tao_mounts.json

{
            "source": "/home/ubuntu/tao-experiments/experiment_dir_retrain",
            "destination": "/workspace/tao-experiments/detectnet_v2/experiment_dir_retrain"
}

Question is: What do I oversee, why do I have to alter the mapping?

The final step again 10. Model export didn’t need any help. I created the experiment_final_dir as subdir under /home/ubuntu/tao-experiments and it perfectly produced model, labels and config.

Moving to TAO Forum.

Usually, the issue is due to the mismatching paths between local and docker.
To ease your work, you can just set

"Mounts": [
        {
            "source": "/home/ubuntu/tao-experiments",
            "destination": "/home/ubuntu/tao-experiments"
        }
    ],

Then all the files under your local path will be mapped into path /home/ubuntu/tao-experiments inside the docker.

That was my idea too and I had it in the beginning. Let me check again.

I tried again with your suggestion and failed again, also with this mapping now:

ubuntu@simulator:~$ cat ~/.tao_mounts.json 
{
    "Mounts": [
        {
            "source": "/home/ubuntu/tao-experiments",
            "destination": "/workspace/tao-experiments"
        }
    ]
}

Environment:

export NUM_GPUS=1
export USER_EXPERIMENT_DIR=/workspace/tao-experiments
export LOCAL_EXPERIMENT_DIR=/home/ubuntu/tao-experiments

export DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data
export LOCAL_PROJECT_DIR=/home/ubuntu/tao-experiments
export LOCAL_DATA_DIR=/home/ubuntu/tao-experiments/data
export VIRTUALENVWRAPPER_PYTHON=/home/ubuntu/anaconda3/envs/launcher/bin/python
export LOCAL_SPECS_DIR=/home/ubuntu/tao-experiments/detectnet_v2/specs
export SPECS_DIR=/workspace/tao-experiments/detectnet_v2/specs

Conversion OK.

I created the download subfolder:

mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/

and downloaded the pre-trained model:

ubuntu@simulator:~$ ngc registry model download-version nvidia/tao/pretrained_detectnet_v2:resnet18 \
>     --dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet18
Getting files to download...
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • 89.0/89.0 MiB • Remaining: 0:00:00 • 6.4 MB/s • Elapsed: 0:00:15 • Total: 1 - Completed: 1 - Failed: 0

------------------------------------------------------------------------------------------------------------------
   Download status: COMPLETED
   Downloaded local path model: /home/ubuntu/tao-experiments/pretrained_resnet18/pretrained_detectnet_v2_vresnet18
   Total files downloaded: 1
   Total transferred: 89.02 MB
   Started at: 2024-05-27 09:15:03
   Completed at: 2024-05-27 09:15:19
   Duration taken: 15s
------------------------------------------------------------------------------------------------------------------
ubuntu@simulator:~$ 

ls -rlt $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/pretrained_detectnet_v2_vresnet18
total 91164
-rw-r--r-- 1 ubuntu ubuntu 93345248 May 27 09:15 resnet18.hdf5

Then I started the training and got an error.

tao model detectnet_v2 train -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt \
>                         -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
>                         -n resnet18_detector \
>                         --gpus $NUM_GPUS
2024-05-27 09:20:48,291 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-05-27 09:20:48,335 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-05-27 09:20:48,341 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 293: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-05-27 09:20:48,341 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2024-05-27 07:20:49.116686: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-05-27 07:20:49,148 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2024-05-27 07:20:50,132 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-27 07:20:50,157 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-27 07:20:50,160 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-27 07:20:51,283 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-27 07:20:52,757 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-27 07:20:52,780 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-27 07:20:52,782 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-27 07:20:53,768 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.logging.logging 197: Log file already exists at /workspace/tao-experiments/experiment_dir_unpruned/status.json
2024-05-27 07:20:53,769 [TAO Toolkit] [INFO] root 2102: Starting DetectNet_v2 Training job
2024-05-27 07:20:53,769 [TAO Toolkit] [INFO] __main__ 817: Loading experiment spec at /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt.
2024-05-27 07:20:53,769 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.spec_handler.spec_loader 113: Merging specification from /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt
2024-05-27 07:20:53,778 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.mlops.wandb 69: Initializing wandb.
2024-05-27 07:20:53,778 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.mlops.wandb 97: Wandb logging failed with error WandB client wasn't logged in. Please make sure to set the WANDB_API_KEY env variable or run `wandb login` in over the CLI and copy the ~/.netrc file to the container.
2024-05-27 07:20:53,778 [TAO Toolkit] [INFO] __main__ 857: Integrating with clearml.
2024-05-27 07:20:53,876 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.mlops.clearml 55: ClearML task init failed with error ClearML configuration could not be found (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at https://app.clear.ml
2024-05-27 07:20:53,876 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.mlops.clearml 58: Training will still continue.
2024-05-27 07:20:53,876 [TAO Toolkit] [INFO] root 2102: Pretrained model file not found: /workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1067, in <module>
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1046, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
    return_args = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1024, in main
    run_experiment(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 877, in run_experiment
    input_model_file_name = get_pretrained_model_path(pretrained_model_file)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/model/utilities.py", line 92, in get_pretrained_model_path
    assert os.path.isfile(model_file), "Pretrained model file not found: %s" % model_file
AssertionError: Pretrained model file not found: /workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5
Execution status: FAIL
2024-05-27 09:20:57,487 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
(launcher) ubuntu@simulator:~/getting_started_v5.3.0$ 

As you can see for some reasons the training tries to locate the pre-trained model under detectnet_v2. I’m sure, to overcome this I copied the files to there and so I initiated the wrong path for all subsequent commands.

The question is: Why not looking at /workspace/tao-experiments/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5 from all what has been prepared before?

Can you check inside the training spec file? There is one line to set pretrained model. Please set the correct path.

Hmm. Well, yes that’s wrong…

But it’s not documented that one has to change that.

(launcher) ubuntu@simulator:~/getting_started_v5.3.0$ cat $LOCAL_SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt
random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tao-experiments/data/training"
  }
  image_extension: "png"
  target_class_mapping {
    key: "car"
    value: "car"
  }
  target_class_mapping {
    key: "cyclist"
    value: "cyclist"
  }
  target_class_mapping {
    key: "pedestrian"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "person_sitting"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "van"
    value: "car"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "car"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 1
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "cyclist"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.15000000596
        dbscan_min_samples: 1
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "pedestrian"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 1
        minimum_bounding_box_height: 20
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 30
  minimum_detection_ground_truth_overlap {
    key: "car"
    value: 0.699999988079
  }
  minimum_detection_ground_truth_overlap {
    key: "cyclist"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "pedestrian"
    value: 0.5
  }
  evaluation_box_config {
    key: "car"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "cyclist"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "pedestrian"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "car"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "cyclist"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "pedestrian"
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: false
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 120
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-07
      max_learning_rate: 5e-05
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  visualizer{
    enabled: true
    num_images: 3
    scalar_logging_frequency: 50
    infrequent_logging_frequency: 5
    target_class_config {
      key: "car"
      value: {
        coverage_threshold: 0.005
      }
    }
    target_class_config {
      key: "pedestrian"
      value: {
        coverage_threshold: 0.005
      }
    }
    target_class_config {
      key: "cyclist"
      value: {
        coverage_threshold: 0.005
      }
    }
    clearml_config{
      project: "TAO Toolkit ClearML Demo"
      task: "detectnet_v2_resnet18_clearml"
      tags: "detectnet_v2"
      tags: "training"
      tags: "resnet18"
      tags: "unpruned"
    }
    wandb_config{
      project: "TAO Toolkit Wandb Demo"
      name: "detectnet_v2_resnet18_wandb"
      tags: "detectnet_v2"
      tags: "training"
      tags: "resnet18"
      tags: "unpruned"
    }
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "car"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "cyclist"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "pedestrian"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}
(launcher) ubuntu@simulator:~/getting_started_v5.3.0$

Training is running now. Wondering where this wrong path comes from

model_config {
  pretrained_model_file: "/workspace/tao-experiments/detectnet_v2/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.hdf5"
  num_layers: 18
  use_batch_norm: true
  load_graph: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  arch: "resnet"
}

…and the same is back again after training and prune…

Yes, you can change here. To match the exact path where you download.

But why would I have to do that? Shouldn’t that be correct at all? I mean, how is this Jupyter thingy supposed to succeed?

I am afraid it is because it is
USER_EXPERIMENT_DIR=/workspace/tao-experiments/detectnet_v2
in tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2/detectnet_v2.ipynb at main · NVIDIA/tao_tutorials · GitHub.

Yours is different.

Holy shit… Thanks for pointing me to that fault…

Ok, now all is fine.

And the next questions arises: Now that I’m obviously able to follow a tutorial (sigh!) where can I learn about what I did at all? What is this with “prune” and “convert” and “train” and how can I use that to do my own models?

TIA

You can refer to doc DetectNet_v2 - NVIDIA Docs and also notebooks.

Would you also have some “swimlane” suggestions, how I could retrain LPD/LPR for EU plates?

For LPD, you can finetune the existing TAO LPD model with your EU plates dataset. Use TAO detectnet_v2 network.
The same approach for LPR. Use TAO LPRNet. Doc is in LPRNet - NVIDIA Docs.

Thank you very much for your assistance

One additional question, since I’m stuck again.

My result dir looks like so:

.

├── calibration.bin
├── labels.txt
├── nvinfer_config.txt
├── resnet18_detector.onnx
├── resnet18_detector.trt.int8
└── status.json

I guess I can forget about status.json. But I’m wondering, how my DeepStream config would have to look like, in order to make use of this model. Parts of the configuration are there in nvinfer_config.txt, caliibration too. But especailly the model engine…

You can take a look in DetectNet_v2 - NVIDIA Docs .

I did, this is my config:

[property]
gpu-id=0
net-scale-factor=0.00392156862745098
offsets=0;0;0
infer-dims=3;544;960
tlt-model-key=tlt_encode
network-type=0
labelfile-path=models/primary-detector/resnet18-detector/labels.txt
tlt-encoded-model=models/primary-detector/resnet18-detector/resnet18_detector.onnx
model-engine-file=models/primary-detector/resnet18-detector/resnet18_detector.onnx.b1_gpu0_int8.engine
int8-calib-file=models/primary-detector/resnet18-detector/calibration.bin
batch-size=1
num-detected-classes=3
model-color-format=0
maintain-aspect-ratio=0
output-tensor-meta=0
cluster-mode=2
gie-unique-id=1
uff-input-order=0
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
uff-input-blob-name=input_1


[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.4
group-threshold=1

But it crashes while creating:

WARNING: ../nvdsinfer/nvdsinfer_model_builder.cpp:1494 Deserialize engine failed because file path: /home/ubuntu/vx-ai-golang/models/primary-detector/resnet18-detector/resnet18_detector.onnx.b1_gpu0_int8.engine open error
0:00:09.559047929 11412 0x7f4a88009060 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2083> [UID = 1]: deserialize engine from file :/home/ubuntu/vx-ai-golang/models/primary-detector/resnet18-detector/resnet18_detector.onnx.b1_gpu0_int8.engine failed
0:00:09.688355061 11412 0x7f4a88009060 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2188> [UID = 1]: deserialize backend context from engine from file :/home/ubuntu/vx-ai-golang/models/primary-detector/resnet18-detector/resnet18_detector.onnx.b1_gpu0_int8.engine failed, try rebuild
0:00:09.689997183 11412 0x7f4a88009060 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2109> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
ERROR: [TRT]: UffParser: Could not read buffer.
parseModel: Failed to parse UFF model
ERROR: tlt/tlt_decode.cpp:359 Failed to build network, error in model parsing.

Please check if you can open the onnx file. You can open it with Netron.