Where are the models?

I am using the detectnet_v2 notebook (tao 5…0.0) as is but with some custom dataset.

Once I am done with the training, I want to evaluate using:

!tao model detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti-1Class.txt\
                           -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet50_detector.hdf5

I got an error saying that the model could not be found.

I checked whether the models are here:

print('Model for each epoch:')
print('---------------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights

gives:

total: 0

Any suggestion to debug this issue?
What if I want to evaluate a specific epoch? Is there a standard format where models are named? for example model_name.epoch_<epoch_num>.hdf5?

Thanks

Training spec:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tao-experiments/data/training"
  }
  image_extension: "png"
  target_class_mapping {
    key: "rumex"
    value: "rumex"
  }  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 2048
    output_image_height: 1376
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
    enable_auto_resize: true
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "rumex"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 1
        minimum_bounding_box_height: 20
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/tao-experiments/detectnet_v2/pretrained_resnet50/pretrained_detectnet_v2_vresnet50/resnet50.hdf5"
  num_layers: 50
  freeze_blocks: 0
  freeze_blocks: 1
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 30
  minimum_detection_ground_truth_overlap {
    key: "rumex"
    value: 0.25
  }
  evaluation_box_config {
    key: "rumex"
    value {
      minimum_height: 10
      maximum_height: 1000
      minimum_width: 10
      maximum_width: 1000
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "rumex"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: false
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 2
  num_epochs: 2000
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-07
      max_learning_rate: 5e-05
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  visualizer{
    enabled: true
    num_images: 3
    scalar_logging_frequency: 10
    infrequent_logging_frequency: 5
    target_class_config {
      key: "rumex"
      value: {
        coverage_threshold: 0.005
      }
    }
    clearml_config{
      project: "TAO DetectNet 1 Class"
      task: "detectnet_v2_resnet50_clearml"
      tags: "detectnet_v2"
      tags: "training"
      tags: "resnet50"
      tags: "unpruned"
    }
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "rumex"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.8 #0.40000000596
      cov_radius_y: 0.8 #0.40000000596
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}

Please check if the model is available in terms of docker path.
! tao model detectnet_v2 run ls $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet50_detector.hdf5

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.