Detectnetv2 wont train if pretrained_model_file is specified. Peoplenet transfer learning

Detectnetv2 will not train if the peoplenet model specified as the pretrained config in the trainspec file.

If the pretrained_model_file: line is commented out in the trainspec file model training will successfully begin. But then there is no transfer learning.

This is the pretrained model:

wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/trainable_v2.5/files/resnet34_peoplenet.tlt

Any help would be appreciated.

Command being run within the docker container (haved tried multiple versions):

detectnet_v2 train -e /home/xxx/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt -r /home/xxx/xxx/new_model/backup -k xxx --gpus 2 --use_amp

Trainspec file:

random_seed: 42
dataset_config {
    data_sources {
        tfrecords_path: "/home/xxx/xxx/new_model/tf/train/*"
        image_directory_path: "/home/xxx/xxx/new_model/kitti/"
    }
    image_extension: "jpg"
    target_class_mapping {
        key: "crane"
        value: "crane"
    }
    target_class_mapping {
        key: "forklift"
        value: "forklift"
    }
    target_class_mapping {
        key: "hat"
        value: "hat"
    }
    target_class_mapping {
        key: "hi_vis"
        value: "hi_vis"
    }
    target_class_mapping {
        key: "person"
        value: "person"
    }
    validation_data_source {
        tfrecords_path: "/home/xxx/xxx/new_model/tf/test/*"
        image_directory_path: "/home/xxx/xxx/new_model/kitti/"
    }
}

augmentation_config {
  preprocessing {
    output_image_width: 1280
    output_image_height: 720
    #crop_right: 400
    #crop_bottom: 304
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
    target_class_config {
      key: "crane"
      value {
        clustering_config {
          coverage_threshold: 0.004999999888241291
          minimum_bounding_box_height: 20
          dbscan_eps: 0.15000000596046448
          dbscan_min_samples: 0.05000000074505806
        }
      }
    }
    target_class_config {
      key: "forklift"
      value {
        clustering_config {
          coverage_threshold: 0.004999999888241291
          minimum_bounding_box_height: 20
          dbscan_eps: 0.15000000596046448
          dbscan_min_samples: 0.05000000074505806
        }
      }
    }
    target_class_config {
      key: "hat"
      value {
        clustering_config {
          coverage_threshold: 0.004999999888241291
          minimum_bounding_box_height: 20
          dbscan_eps: 0.15000000596046448
          dbscan_min_samples: 0.05000000074505806
        }
      }
    }
    target_class_config {
      key: "hi_vis"
      value {
        clustering_config {
          coverage_threshold: 0.004999999888241291
          minimum_bounding_box_height: 20
          dbscan_eps: 0.15000000596046448
          dbscan_min_samples: 0.05000000074505806
        }
      }
    }
    target_class_config {
      key: "person"
      value {
        clustering_config {
          coverage_threshold: 0.004999999888241291
          minimum_bounding_box_height: 20
          dbscan_eps: 0.20000000298023224
          dbscan_min_samples: 0.05000000074505806
        }
      }
    }
  }

model_config {
  pretrained_model_file: "/home/xxx/dev/AI_tools/object_detector/tao/pretrained_models/resnet34_peoplenet.tlt"
  num_layers: 34
  use_batch_norm: true
  activation {
    activation_type: "relu"
  }
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
    validation_period_during_training: 2
    first_validation_epoch: 1
    minimum_detection_ground_truth_overlap {
      key: "crane"
      value: 0.25
    }
    minimum_detection_ground_truth_overlap {
      key: "forklift"
      value: 0.25
    }
    minimum_detection_ground_truth_overlap {
      key: "hat"
      value: 0.25
    }
    minimum_detection_ground_truth_overlap {
      key: "hi_vis"
      value: 0.25
    }
    minimum_detection_ground_truth_overlap {
      key: "person"
      value: 0.25
    }
    evaluation_box_config {
      key: "crane"
      value {
        minimum_height: 4
        maximum_height: 9999
        minimum_width: 4
        maximum_width: 9999
      }
    }
    evaluation_box_config {
      key: "forklift"
      value {
        minimum_height: 4
        maximum_height: 9999
        minimum_width: 4
        maximum_width: 9999
      }
    }
    evaluation_box_config {
      key: "hat"
      value {
        minimum_height: 4
        maximum_height: 9999
        minimum_width: 4
        maximum_width: 9999
      }
    }
    evaluation_box_config {
      key: "hi_vis"
      value {
        minimum_height: 4
        maximum_height: 9999
        minimum_width: 4
        maximum_width: 9999
      }
    }
    evaluation_box_config {
      key: "person"
      value {
        minimum_height: 4
        maximum_height: 9999
        minimum_width: 4
        maximum_width: 9999
      }
    }
    average_precision_mode: INTEGRATE
  }
 
  cost_function_config {
    target_classes {
      name: "person"
      class_weight: 1.0
      coverage_foreground_weight: 0.05000000074505806
      objectives {
        name: "cov"
        initial_weight: 1.0
        weight_target: 1.0
      }
      objectives {
        name: "bbox"
        initial_weight: 10.0
        weight_target: 10.0
      }
    }
    target_classes {
      name: "hat"
      class_weight: 1.0
      coverage_foreground_weight: 0.05000000074505806
      objectives {
        name: "cov"
        initial_weight: 1.0
        weight_target: 1.0
      }
      objectives {
        name: "bbox"
        initial_weight: 10.0
        weight_target: 1.0
      }
    }
    target_classes {
      name: "hi_vis"
      class_weight: 1.0
      coverage_foreground_weight: 0.05000000074505806
      objectives {
        name: "cov"
        initial_weight: 1.0
        weight_target: 1.0
      }
      objectives {
        name: "bbox"
        initial_weight: 10.0
        weight_target: 10.0
      }
    }
    target_classes {
      name: "forklift"
      class_weight: 1.0
      coverage_foreground_weight: 0.05000000074505806
      objectives {
        name: "cov"
        initial_weight: 1.0
        weight_target: 1.0
      }
      objectives {
        name: "bbox"
        initial_weight: 10.0
        weight_target: 10.0
      }
    }
    target_classes {
      name: "crane"
      class_weight: 1.0
      coverage_foreground_weight: 0.05000000074505806
      objectives {
        name: "cov"
        initial_weight: 1.0
        weight_target: 1.0
      }
      objectives {
        name: "bbox"
        initial_weight: 10.0
        weight_target: 10.0
      }
    }
    enable_autoweighting: true
    max_objective_weight: 0.9998999834060669
    min_objective_weight: 9.999999747378752e-05
  }

training_config {
  batch_size_per_gpu: 8
  num_epochs: 100
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-05
      soft_start: 0.0
      annealing: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
    target_class_config {
      key: "crane"
      value {
        cov_center_x: 0.5
        cov_center_y: 0.5
        cov_radius_x: 0.4000000059604645
        cov_radius_y: 0.4000000059604645
        bbox_min_radius: 1.0
      }
    }
    target_class_config {
      key: "forklift"
      value {
        cov_center_x: 0.5
        cov_center_y: 0.5
        cov_radius_x: 0.4000000059604645
        cov_radius_y: 0.4000000059604645
        bbox_min_radius: 1.0
      }
    }
    target_class_config {
      key: "hat"
      value {
        cov_center_x: 0.5
        cov_center_y: 0.5
        cov_radius_x: 0.4000000059604645
        cov_radius_y: 0.4000000059604645
        bbox_min_radius: 1.0
      }
    }
    target_class_config {
      key: "hi_vis"
      value {
        cov_center_x: 0.5
        cov_center_y: 0.5
        cov_radius_x: 0.4000000059604645
        cov_radius_y: 0.4000000059604645
        bbox_min_radius: 1.0
      }
    }
    target_class_config {
      key: "person"
      value {
        cov_center_x: 0.5
        cov_center_y: 0.5
        cov_radius_x: 0.4000000059604645
        cov_radius_y: 0.4000000059604645
        bbox_min_radius: 1.0
      }
    }
    deadzone_radius: 0.4000000059604645
  }

Docker output:

root@28a41a7ce399:/workspace# detectnet_v2 train -e /home/xxx/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt -r /home/inviol/inviol/new_model/backup -k inviol --gpus 2 --use_amp
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

2021-12-07 04:40:15,365 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

Using TensorFlow backend.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

2021-12-07 04:40:15,373 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2021-12-07 04:40:15,444 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2021-12-07 04:40:15,445 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

2021-12-07 04:40:15,445 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-12-07 04:40:15,451 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-12-07 04:40:15,451 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2021-12-07 04:40:15,462 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2021-12-07 04:40:15,464 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

2021-12-07 04:40:15,464 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-12-07 04:40:15,469 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-12-07 04:40:15,469 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-12-07 04:40:15,851 [INFO] __main__: Loading experiment spec at /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt.
2021-12-07 04:40:15,852 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt
2021-12-07 04:40:15,859 [INFO] __main__: Loading experiment spec at /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt.
2021-12-07 04:40:15,860 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt
2021-12-07 04:40:18,327 [INFO] __main__: Cannot iterate over exactly 47705 samples with a batch size of 32; each epoch will therefore take one extra step.
2021-12-07 04:40:18,327 [INFO] __main__: Cannot iterate over exactly 745 steps per epoch with 32 processors; each processor will therefore take one extra step per epoch.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2021-12-07 04:40:18,329 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2021-12-07 04:40:18,329 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2021-12-07 04:40:18,331 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2021-12-07 04:40:18,404 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2021-12-07 04:40:18,405 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2021-12-07 04:40:18,410 [INFO] __main__: Cannot iterate over exactly 47705 samples with a batch size of 32; each epoch will therefore take one extra step.
2021-12-07 04:40:18,410 [INFO] __main__: Cannot iterate over exactly 745 steps per epoch with 32 processors; each processor will therefore take one extra step per epoch.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2021-12-07 04:40:18,412 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2021-12-07 04:40:18,413 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2021-12-07 04:40:18,414 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2021-12-07 04:40:18,418 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2021-12-07 04:40:18,488 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2021-12-07 04:40:18,488 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2021-12-07 04:40:18,501 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2021-12-07 04:40:19,513 [INFO] __main__: Training was interrupted.
Time taken to run __main__:main: 0:00:04.067285.
2021-12-07 04:40:19,614 [INFO] __main__: Training was interrupted.
Time taken to run __main__:main: 0:00:04.150614.
root@28a41a7ce399:/workspace# 

Other info:
• Training Hardware: 2 x RTX A6000 GPU
• Network Type: DetectnetV2
• TLT Version: tao/tao-toolkit-tf: v3.21.11-tf1.15.5-py3 (also tried v3.21.08-py3)

Please change
output_image_width: 1280
output_image_height: 720

to

output_image_width: 960
output_image_height: 544

and retry.

Thanks @Morganh, sadly this doesn’t make a difference. Training still won’t start.

Screenshot from 2021-12-08 09-32-35

output:

root@0344b1f94ec6:/workspace# detectnet_v2 train -e /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt -r /home/inviol/inviol/new_model/backup -k inviol --gpus 2 --use_amp
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

2021-12-07 19:30:59,672 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

Using TensorFlow backend.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

2021-12-07 19:30:59,698 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2021-12-07 19:30:59,769 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2021-12-07 19:30:59,770 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2021-12-07 19:30:59,770 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

2021-12-07 19:30:59,770 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2021-12-07 19:30:59,772 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

2021-12-07 19:30:59,772 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:69: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-12-07 19:30:59,776 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-12-07 19:30:59,776 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-12-07 19:30:59,777 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-12-07 19:30:59,777 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-12-07 19:31:00,186 [INFO] iva.common.logging.logging: Log file already exists at /home/inviol/inviol/new_model/backup/status.json
2021-12-07 19:31:00,186 [INFO] __main__: Loading experiment spec at /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt.
2021-12-07 19:31:00,187 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt
2021-12-07 19:31:00,189 [INFO] iva.common.logging.logging: Log file already exists at /home/inviol/inviol/new_model/backup/status.json
2021-12-07 19:31:00,189 [INFO] __main__: Loading experiment spec at /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt.
2021-12-07 19:31:00,190 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/inviol/dev/AI_tools/object_detector/tao/trainspec_detectnetv2_v0.txt
2021-12-07 19:31:02,709 [INFO] __main__: Cannot iterate over exactly 47705 samples with a batch size of 32; each epoch will therefore take one extra step.
2021-12-07 19:31:02,709 [INFO] __main__: Cannot iterate over exactly 745 steps per epoch with 32 processors; each processor will therefore take one extra step per epoch.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2021-12-07 19:31:02,711 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2021-12-07 19:31:02,711 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2021-12-07 19:31:02,713 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2021-12-07 19:31:02,717 [INFO] __main__: Cannot iterate over exactly 47705 samples with a batch size of 32; each epoch will therefore take one extra step.
2021-12-07 19:31:02,717 [INFO] __main__: Cannot iterate over exactly 745 steps per epoch with 32 processors; each processor will therefore take one extra step per epoch.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2021-12-07 19:31:02,719 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2021-12-07 19:31:02,719 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2021-12-07 19:31:02,721 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2021-12-07 19:31:02,785 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2021-12-07 19:31:02,785 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2021-12-07 19:31:02,792 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2021-12-07 19:31:02,793 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2021-12-07 19:31:02,798 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2021-12-07 19:31:02,805 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

Time taken to run __main__:main: 0:00:04.141299.
2021-12-07 19:31:03,912 [INFO] __main__: Training was interrupted.
2021-12-07 19:31:03,915 [INFO] __main__: Training was interrupted.
Time taken to run __main__:main: 0:00:04.143051.
root@0344b1f94ec6:/workspace#

Can you try again without “–gpus 2 --use_amp” ?

@Morganh tried it, but no change.

For detectnet_v2 and faster_rcnn, please use tao/tao-toolkit-tf: v3.21.11-tf1.15.4-py3

@Morganh I have been testing this version above as per your suggestions. I am already using this version and it is still behaving the same way.

For testing purposes I am using the exact same pre trained model and training spec from a white paper published by Nvidia used here: TAO-Toolkit-Whitepaper-use-cases/workspace/peoplenet_helmet at main · NVIDIA-AI-IOT/TAO-Toolkit-Whitepaper-use-cases · GitHub

Can you use a new result folder and retry? Thanks.

Sure, no change. Folder was empty before command was run. This is screenshot post running the train command.

@Morganh

@Morganh I found the issue. You have to use the key -k tlt_encode to train with existing models.

1 Like

Correct. That is it. The pretrained model provided in ngc has its own key. Thanks for your finding.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.