Detectnet_v2.ipynb issue with custom data

System:

• Hardware: GeForce RTX 3080
• Network Type: Detectnet_v2
• TLT Version:

format_version: 3.0
toolkit_version: 5.3.0

• Cuda version: 12.2
• Driver version: 535.171.04

Problem:

I am trying to train a model using detectnet_v2 notebook with a custom dataset but get an error.

I used the below command to generate the tfrecords:

# Creating a new directory for the output tfrecords dump.
print("Converting Tfrecords for kitti trainval dataset")
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!tao model detectnet_v2 dataset_convert \
                  -d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
                  -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval \
                  -r $USER_EXPERIMENT_DIR/

and it returned this output:


Converting Tfrecords for kitti trainval dataset
2024-05-01 10:16:01,872 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-05-01 10:16:01,931 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-05-01 10:16:01,967 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2024-05-01 15:16:06.601871: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-05-01 15:16:06,710 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2024-05-01 15:16:12,432 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 15:16:12,523 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 15:16:12,553 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 15:16:18,220 [TAO Toolkit] [WARNING] matplotlib 500: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-2ptmi6eq because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-05-01 15:16:18,732 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 15:16:20,631 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 15:16:20,762 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 15:16:20,766 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 15:16:21,141 [TAO Toolkit] [INFO] root 2102: Starting Object Detection Dataset Convert.
2024-05-01 15:16:21,142 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.build_converter 87: Instantiating a kitti converter
2024-05-01 15:16:21,142 [TAO Toolkit] [INFO] root 2102: Instantiating a kitti converter
2024-05-01 15:16:21,142 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 71: Creating output directory /workspace/tao-experiments/data/tfrecords/kitti_trainval
2024-05-01 15:16:21,142 [TAO Toolkit] [INFO] root 2102: Generating partitions
2024-05-01 15:16:21,145 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.kitti_converter_lib 176: Num images in
Train: 1413	Val: 230
2024-05-01 15:16:21,145 [TAO Toolkit] [INFO] root 2102: Num images in
Train: 1413	Val: 230
2024-05-01 15:16:21,145 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.kitti_converter_lib 197: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2024-05-01 15:16:21,145 [TAO Toolkit] [INFO] root 2102: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2024-05-01 15:16:21,146 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 0
2024-05-01 15:16:21,146 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 0
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/dataio/dataset_converter_lib.py:181: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2024-05-01 15:16:21,146 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/dataio/dataset_converter_lib.py:181: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2024-05-01 15:16:21,171 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 1
2024-05-01 15:16:21,171 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 1
2024-05-01 15:16:21,178 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 2
2024-05-01 15:16:21,178 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 2
2024-05-01 15:16:21,184 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 3
2024-05-01 15:16:21,184 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 3
2024-05-01 15:16:21,191 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 4
2024-05-01 15:16:21,191 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 4
2024-05-01 15:16:21,198 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 5
2024-05-01 15:16:21,198 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 5
2024-05-01 15:16:21,204 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 6
2024-05-01 15:16:21,204 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 6
2024-05-01 15:16:21,211 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 7
2024-05-01 15:16:21,211 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 7
2024-05-01 15:16:21,218 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 8
2024-05-01 15:16:21,218 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 8
2024-05-01 15:16:21,225 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 9
2024-05-01 15:16:21,225 [TAO Toolkit] [INFO] root 2102: Writing partition 0, shard 9
2024-05-01 15:16:21,231 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250: 
Wrote the following numbers of objects:
b'dice': 397

2024-05-01 15:16:21,231 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 0
2024-05-01 15:16:21,231 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 0
2024-05-01 15:16:21,271 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 1
2024-05-01 15:16:21,271 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 1
2024-05-01 15:16:21,309 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 2
2024-05-01 15:16:21,310 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 2
2024-05-01 15:16:21,349 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 3
2024-05-01 15:16:21,349 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 3
2024-05-01 15:16:21,387 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 4
2024-05-01 15:16:21,387 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 4
2024-05-01 15:16:21,426 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 5
2024-05-01 15:16:21,426 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 5
2024-05-01 15:16:21,465 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 6
2024-05-01 15:16:21,465 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 6
2024-05-01 15:16:21,504 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 7
2024-05-01 15:16:21,504 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 7
2024-05-01 15:16:21,542 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 8
2024-05-01 15:16:21,542 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 8
2024-05-01 15:16:21,583 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 9
2024-05-01 15:16:21,583 [TAO Toolkit] [INFO] root 2102: Writing partition 1, shard 9
2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250: 
Wrote the following numbers of objects:
b'battery': 928
b'toycar': 755
b'dice': 497
b'highlighter': 90
b'spoon': 46
b'candle': 101

2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 89: Cumulative object statistics
2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] root 2102: Cumulative object statistics
2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250: 
Wrote the following numbers of objects:
b'dice': 894
b'battery': 928
b'toycar': 755
b'highlighter': 90
b'spoon': 46
b'candle': 101

2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 105: Class map. 
Label in GT: Label in tfrecords file 
b'dice': b'dice'
b'battery': b'battery'
b'toycar': b'toycar'
b'highlighter': b'highlighter'
b'spoon': b'spoon'
b'candle': b'candle'
2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] root 2102: Class map. 
Label in GT: Label in tfrecords file 
b'dice': b'dice'
b'battery': b'battery'
b'toycar': b'toycar'
b'highlighter': b'highlighter'
b'spoon': b'spoon'
b'candle': b'candle'
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] root 2102: For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 114: Tfrecords generation complete.
2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] root 2102: TFRecords generation complete.
2024-05-01 15:16:21,624 [TAO Toolkit] [INFO] root 2102: Dataset convert finished successfully.
Execution status: PASS

What's next?
  Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug 68d2e214113bb229b75a1ee6c60145657e8156b520a4a080987d21dc941e192f
  Learn more at https://docs.docker.com/go/debug-cli/
2024-05-01 10:16:24,140 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
  • I double check the tf record files and there were no 0 size:
!ls -rlt $LOCAL_DATA_DIR/tfrecords/kitti_trainval/
total 1080
-rw-r--r-- 1 saeed saeed 15367 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00000-of-00010
-rw-r--r-- 1 saeed saeed 14659 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00001-of-00010
-rw-r--r-- 1 saeed saeed 15190 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00002-of-00010
-rw-r--r-- 1 saeed saeed 14718 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00003-of-00010
-rw-r--r-- 1 saeed saeed 14423 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00004-of-00010
-rw-r--r-- 1 saeed saeed 15131 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00005-of-00010
-rw-r--r-- 1 saeed saeed 14600 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00006-of-00010
-rw-r--r-- 1 saeed saeed 14718 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00007-of-00010
-rw-r--r-- 1 saeed saeed 14718 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00008-of-00010
-rw-r--r-- 1 saeed saeed 15249 May  1 10:16 kitti_trainval-fold-000-of-002-shard-00009-of-00010
-rw-r--r-- 1 saeed saeed 93969 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00000-of-00010
-rw-r--r-- 1 saeed saeed 90736 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00001-of-00010
-rw-r--r-- 1 saeed saeed 92240 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00002-of-00010
-rw-r--r-- 1 saeed saeed 91012 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00003-of-00010
-rw-r--r-- 1 saeed saeed 90958 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00004-of-00010
-rw-r--r-- 1 saeed saeed 91571 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00005-of-00010
-rw-r--r-- 1 saeed saeed 92008 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00006-of-00010
-rw-r--r-- 1 saeed saeed 91140 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00007-of-00010
-rw-r--r-- 1 saeed saeed 91163 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00008-of-00010
-rw-r--r-- 1 saeed saeed 93063 May  1 10:16 kitti_trainval-fold-001-of-002-shard-00009-of-00010

here is the detectnet_v2_tfrecords_kitti_trainval.txt:

kitti_config {
  root_directory_path: "/workspace/tao-experiments/data/training"
  image_dir_name: "image_2"
  label_dir_name: "label_2"
  image_extension: ".png"
  partition_mode: "random"
  num_partitions: 2
  val_split: 14
  num_shards: 10
}
image_directory_path: "/workspace/tao-experiments/data/training"

target_class_mapping {
    key: "battery"
    value: "battery"
}
target_class_mapping {
    key: "dice"
    value: "dice"
}
target_class_mapping {
    key: "toycar"
    value: "toycar"
}
target_class_mapping {
    key: "spoon"
    value: "spoon"
}
target_class_mapping {
    key: "highlighter"
    value: "highlighter"
}
target_class_mapping {
    key: "candle"
    value: "candle"
}

Then I start training using this code:

!tao model detectnet_v2 train -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                        -n resnet18_detector \
                        --gpus $NUM_GPUS

and detectnet_v2_train_resnet18_kitti.txt config file:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tao-experiments/data/training"
  }
  image_extension: "png"

  target_class_mapping {
    key: "battery"
    value: "battery"
  }
  target_class_mapping {
    key: "dice"
    value: "dice"
  }
  target_class_mapping {
    key: "toycar"
    value: "toycar"
  }
  target_class_mapping {
    key: "spoon"
    value: "spoon"
  }
  target_class_mapping {
    key: "highlighter"
    value: "highlighter"
  }
  target_class_mapping {
    key: "candle"
    value: "candle"
  }
  validation_fold: 0
}

augmentation_config {
  preprocessing {
    output_image_width: 640
    output_image_height: 640
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "toycar"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 1
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "battery"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.15000000596
        dbscan_min_samples: 1
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "candle"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 1
        minimum_bounding_box_height: 20
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 30
  minimum_detection_ground_truth_overlap {
    key: "toycar"
    value: 0.699999988079
  }
  minimum_detection_ground_truth_overlap {
    key: "battery"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "candle"
    value: 0.5
  }
  evaluation_box_config {
    key: "toycar"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "battery"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "candle"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "toycar"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "battery"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "candle"
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: false
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 120
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-07
      max_learning_rate: 5e-05
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  visualizer{
    enabled: true
    num_images: 3
    scalar_logging_frequency: 50
    infrequent_logging_frequency: 5
    target_class_config {
      key: "toycar"
      value: {
        coverage_threshold: 0.005
      }
    }
    target_class_config {
      key: "candle"
      value: {
        coverage_threshold: 0.005
      }
    }
    target_class_config {
      key: "battery"
      value: {
        coverage_threshold: 0.005
      }
    }
    clearml_config{
      project: "TAO Toolkit ClearML Demo"
      task: "detectnet_v2_resnet18_clearml"
      tags: "detectnet_v2"
      tags: "training"
      tags: "resnet18"
      tags: "unpruned"
    }
    wandb_config{
      project: "TAO Toolkit Wandb Demo"
      name: "detectnet_v2_resnet18_wandb"
      tags: "detectnet_v2"
      tags: "training"
      tags: "resnet18"
      tags: "unpruned"
    }
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "toycar"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "battery"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "candle"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}

Which returned this error:

2024-05-01 11:06:26,146 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-05-01 11:06:26,184 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-05-01 11:06:26,197 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2024-05-01 16:06:28.797473: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-05-01 16:06:28,829 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2024-05-01 16:06:30,487 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 16:06:30,515 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 16:06:30,518 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 16:06:32,024 [TAO Toolkit] [WARNING] matplotlib 500: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-z4s02405 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-05-01 16:06:32,210 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 16:06:33,662 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 16:06:33,688 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 16:06:33,691 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-05-01 16:06:34,605 [TAO Toolkit] [INFO] root 2102: Starting DetectNet_v2 Training job
2024-05-01 16:06:34,605 [TAO Toolkit] [INFO] __main__ 817: Loading experiment spec at /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt.
2024-05-01 16:06:34,605 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.spec_handler.spec_loader 113: Merging specification from /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt
2024-05-01 16:06:34,610 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.mlops.wandb 69: Initializing wandb.
2024-05-01 16:06:34,610 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.mlops.wandb 97: Wandb logging failed with error WandB client wasn't logged in. Please make sure to set the WANDB_API_KEY env variable or run `wandb login` in over the CLI and copy the ~/.netrc file to the container.
2024-05-01 16:06:34,610 [TAO Toolkit] [INFO] __main__ 857: Integrating with clearml.
2024-05-01 16:06:34,715 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.mlops.clearml 55: ClearML task init failed with error ClearML configuration could not be found (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at https://app.clear.ml
2024-05-01 16:06:34,715 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.mlops.clearml 58: Training will still continue.
2024-05-01 16:06:34,715 [TAO Toolkit] [INFO] root 2102: Training gridbox model.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2024-05-01 16:06:34,715 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2024-05-01 16:06:34,834 [TAO Toolkit] [INFO] root 522: Sampling mode of the dataloader was set to user_defined.
2024-05-01 16:06:34,834 [TAO Toolkit] [INFO] __main__ 99: Cannot iterate over exactly 1413 samples with a batch size of 4; each epoch will therefore take one extra step.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:122: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2024-05-01 16:06:34,835 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:122: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:125: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2024-05-01 16:06:34,835 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:125: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:128: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2024-05-01 16:06:34,837 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:128: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2024-05-01 16:06:34,853 [TAO Toolkit] [INFO] root 2102: Building DetectNet V2 model
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2024-05-01 16:06:34,853 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2024-05-01 16:06:34,854 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2024-05-01 16:06:34,868 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:199: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

2024-05-01 16:06:35,639 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:199: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2024-05-01 16:06:35,786 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2024-05-01 16:06:35,787 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2024-05-01 16:06:35,787 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2024-05-01 16:06:36,083 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2024-05-01 16:06:36,521 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 133: Loading weights from pretrained model file. /workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5
2024-05-01 16:06:36,521 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer input_1 weights set from pre-trained model.
2024-05-01 16:06:36,627 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer conv1 weights set from pre-trained model.
2024-05-01 16:06:36,732 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer bn_conv1 weights set from pre-trained model.
2024-05-01 16:06:36,732 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer activation_1 weights set from pre-trained model.
2024-05-01 16:06:36,837 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1a_conv_1 weights set from pre-trained model.
2024-05-01 16:06:36,949 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1a_bn_1 weights set from pre-trained model.
2024-05-01 16:06:37,058 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1a_conv_2 weights set from pre-trained model.
2024-05-01 16:06:37,165 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1a_conv_shortcut weights set from pre-trained model.
2024-05-01 16:06:37,276 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1a_bn_2 weights set from pre-trained model.
2024-05-01 16:06:37,386 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1a_bn_shortcut weights set from pre-trained model.
2024-05-01 16:06:37,386 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer add_1 weights set from pre-trained model.
2024-05-01 16:06:37,493 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1b_conv_1 weights set from pre-trained model.
2024-05-01 16:06:37,609 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1b_bn_1 weights set from pre-trained model.
2024-05-01 16:06:37,722 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1b_conv_2 weights set from pre-trained model.
2024-05-01 16:06:37,841 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_1b_bn_2 weights set from pre-trained model.
2024-05-01 16:06:37,841 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer add_2 weights set from pre-trained model.
2024-05-01 16:06:37,950 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2a_conv_1 weights set from pre-trained model.
2024-05-01 16:06:38,066 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2a_bn_1 weights set from pre-trained model.
2024-05-01 16:06:38,178 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2a_conv_2 weights set from pre-trained model.
2024-05-01 16:06:38,287 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2a_conv_shortcut weights set from pre-trained model.
2024-05-01 16:06:38,404 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2a_bn_2 weights set from pre-trained model.
2024-05-01 16:06:38,517 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2a_bn_shortcut weights set from pre-trained model.
2024-05-01 16:06:38,517 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer add_3 weights set from pre-trained model.
2024-05-01 16:06:38,632 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2b_conv_1 weights set from pre-trained model.
2024-05-01 16:06:38,748 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2b_bn_1 weights set from pre-trained model.
2024-05-01 16:06:38,863 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2b_conv_2 weights set from pre-trained model.
2024-05-01 16:06:38,976 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_2b_bn_2 weights set from pre-trained model.
2024-05-01 16:06:38,976 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer add_4 weights set from pre-trained model.
2024-05-01 16:06:39,086 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3a_conv_1 weights set from pre-trained model.
2024-05-01 16:06:39,201 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3a_bn_1 weights set from pre-trained model.
2024-05-01 16:06:39,313 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3a_conv_2 weights set from pre-trained model.
2024-05-01 16:06:39,425 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3a_conv_shortcut weights set from pre-trained model.
2024-05-01 16:06:39,539 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3a_bn_2 weights set from pre-trained model.
2024-05-01 16:06:39,658 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3a_bn_shortcut weights set from pre-trained model.
2024-05-01 16:06:39,658 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer add_5 weights set from pre-trained model.
2024-05-01 16:06:39,777 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3b_conv_1 weights set from pre-trained model.
2024-05-01 16:06:39,899 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3b_bn_1 weights set from pre-trained model.
2024-05-01 16:06:40,014 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3b_conv_2 weights set from pre-trained model.
2024-05-01 16:06:40,148 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_3b_bn_2 weights set from pre-trained model.
2024-05-01 16:06:40,148 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer add_6 weights set from pre-trained model.
2024-05-01 16:06:40,270 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4a_conv_1 weights set from pre-trained model.
2024-05-01 16:06:40,419 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4a_bn_1 weights set from pre-trained model.
2024-05-01 16:06:40,569 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4a_conv_2 weights set from pre-trained model.
2024-05-01 16:06:40,682 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4a_conv_shortcut weights set from pre-trained model.
2024-05-01 16:06:40,801 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4a_bn_2 weights set from pre-trained model.
2024-05-01 16:06:40,920 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4a_bn_shortcut weights set from pre-trained model.
2024-05-01 16:06:40,920 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer add_7 weights set from pre-trained model.
2024-05-01 16:06:41,038 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4b_conv_1 weights set from pre-trained model.
2024-05-01 16:06:41,159 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4b_bn_1 weights set from pre-trained model.
2024-05-01 16:06:41,277 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4b_conv_2 weights set from pre-trained model.
2024-05-01 16:06:41,399 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer block_4b_bn_2 weights set from pre-trained model.
2024-05-01 16:06:41,400 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.model.detectnet_model 142: Layer add_8 weights set from pre-trained model.
2024-05-01 16:06:41,470 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.objectives.bbox_objective 78: Default L1 loss function will be used.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 3, 320, 320)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 160, 160) 9472        input_1[0][0]                    
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 160, 160) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 160, 160) 0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 80, 80)   36928       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 80, 80)   256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (None, 64, 80, 80)   0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 80, 80)   36928       block_1a_relu_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 80, 80)   4160        activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 80, 80)   256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 80, 80)   256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 80, 80)   0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1a_relu (Activation)      (None, 64, 80, 80)   0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 80, 80)   36928       block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 80, 80)   256         block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (None, 64, 80, 80)   0           block_1b_bn_1[0][0]              
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 80, 80)   36928       block_1b_relu_1[0][0]            
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 80, 80)   256         block_1b_conv_2[0][0]            
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 80, 80)   0           block_1b_bn_2[0][0]              
                                                                 block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_relu (Activation)      (None, 64, 80, 80)   0           add_2[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 40, 40)  73856       block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 40, 40)  512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (None, 128, 40, 40)  0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 40, 40)  147584      block_2a_relu_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 40, 40)  8320        block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 40, 40)  512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 40, 40)  512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 40, 40)  0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2a_relu (Activation)      (None, 128, 40, 40)  0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 40, 40)  147584      block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 40, 40)  512         block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (None, 128, 40, 40)  0           block_2b_bn_1[0][0]              
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 40, 40)  147584      block_2b_relu_1[0][0]            
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 40, 40)  512         block_2b_conv_2[0][0]            
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 40, 40)  0           block_2b_bn_2[0][0]              
                                                                 block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_relu (Activation)      (None, 128, 40, 40)  0           add_4[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 20, 20)  295168      block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 20, 20)  1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (None, 256, 20, 20)  0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 20, 20)  590080      block_3a_relu_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 20, 20)  33024       block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 20, 20)  1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 20, 20)  1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 20, 20)  0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3a_relu (Activation)      (None, 256, 20, 20)  0           add_5[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 20, 20)  590080      block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 20, 20)  1024        block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (None, 256, 20, 20)  0           block_3b_bn_1[0][0]              
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 20, 20)  590080      block_3b_relu_1[0][0]            
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 20, 20)  1024        block_3b_conv_2[0][0]            
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 20, 20)  0           block_3b_bn_2[0][0]              
                                                                 block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_relu (Activation)      (None, 256, 20, 20)  0           add_6[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 20, 20)  1180160     block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 20, 20)  2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (None, 512, 20, 20)  0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 20, 20)  2359808     block_4a_relu_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 20, 20)  131584      block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 20, 20)  2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 20, 20)  2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 20, 20)  0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4a_relu (Activation)      (None, 512, 20, 20)  0           add_7[0][0]                      
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 20, 20)  2359808     block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 20, 20)  2048        block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (None, 512, 20, 20)  0           block_4b_bn_1[0][0]              
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 20, 20)  2359808     block_4b_relu_1[0][0]            
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 20, 20)  2048        block_4b_conv_2[0][0]            
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 20, 20)  0           block_4b_bn_2[0][0]              
                                                                 block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_relu (Activation)      (None, 512, 20, 20)  0           add_8[0][0]                      
__________________________________________________________________________________________________
output_bbox (Conv2D)            (None, 12, 20, 20)   6156        block_4b_relu[0][0]              
__________________________________________________________________________________________________
output_cov (Conv2D)             (None, 3, 20, 20)    1539        block_4b_relu[0][0]              
==================================================================================================
Total params: 11,203,023
Trainable params: 11,193,295
Non-trainable params: 9,728
__________________________________________________________________________________________________
2024-05-01 16:06:41,491 [TAO Toolkit] [INFO] root 2102: DetectNet V2 model built.
2024-05-01 16:06:41,491 [TAO Toolkit] [INFO] root 2102: Building rasterizer.
2024-05-01 16:06:41,492 [TAO Toolkit] [INFO] root 2102: Rasterizers built.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/training_proto_utilities.py:102: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

2024-05-01 16:06:41,492 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/training_proto_utilities.py:102: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py:718: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

2024-05-01 16:06:41,504 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py:718: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

2024-05-01 16:06:41,504 [TAO Toolkit] [INFO] root 2102: Building training graph.
2024-05-01 16:06:41,505 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 175: Serial augmentation enabled = False
2024-05-01 16:06:41,505 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 177: Pseudo sharding enabled = False
2024-05-01 16:06:41,505 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 269: Max Image Dimensions (all sources): (0, 0)
2024-05-01 16:06:41,506 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 380: number of cpus: 20, io threads: 40, compute threads: 20, buffered batches: 4
2024-05-01 16:06:41,506 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 387: total dataset size 1413, number of sources: 1, batch size per gpu: 4, steps: 354
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

2024-05-01 16:06:41,530 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

2024-05-01 16:06:43,274 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataloader.default_dataloader 546: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2024-05-01 16:06:45,483 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 409: shuffle: True - shard 0 of 1
2024-05-01 16:06:45,487 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 479: sampling 1 datasets with weights:
2024-05-01 16:06:45,487 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 481: source: 0 weight: 1.000000
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

2024-05-01 16:06:46,159 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

2024-05-01 16:06:46,807 [TAO Toolkit] [INFO] __main__ 536: Found 1413 samples in training set
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/visualizer/tensorboard_visualizer.py:92: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.

2024-05-01 16:06:46,809 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/visualizer/tensorboard_visualizer.py:92: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.

2024-05-01 16:06:46,811 [TAO Toolkit] [INFO] root 2102: Rasterizing tensors.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/rasterizers/bbox_rasterizer.py:348: The name tf.bincount is deprecated. Please use tf.math.bincount instead.

2024-05-01 16:06:46,885 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/rasterizers/bbox_rasterizer.py:348: The name tf.bincount is deprecated. Please use tf.math.bincount instead.

2024-05-01 16:06:46,972 [TAO Toolkit] [INFO] root 2102: Tensors rasterized.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/training_proto_utilities.py:49: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

2024-05-01 16:06:46,972 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/training_proto_utilities.py:49: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_functions.py:29: The name tf.log is deprecated. Please use tf.math.log instead.

2024-05-01 16:06:47,097 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_functions.py:29: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:250: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

2024-05-01 16:06:47,207 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:250: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/visualizer/tensorboard_visualizer.py:99: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.

2024-05-01 16:06:48,606 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/visualizer/tensorboard_visualizer.py:99: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.

2024-05-01 16:06:49,075 [TAO Toolkit] [INFO] root 2102: Training graph built.
2024-05-01 16:06:49,075 [TAO Toolkit] [INFO] root 2102: Building validation graph.
2024-05-01 16:06:49,076 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 175: Serial augmentation enabled = False
2024-05-01 16:06:49,076 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 177: Pseudo sharding enabled = False
2024-05-01 16:06:49,076 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 269: Max Image Dimensions (all sources): (0, 0)
2024-05-01 16:06:49,076 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 380: number of cpus: 20, io threads: 40, compute threads: 20, buffered batches: 4
2024-05-01 16:06:49,076 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 387: total dataset size 230, number of sources: 1, batch size per gpu: 4, steps: 58
2024-05-01 16:06:49,097 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataloader.default_dataloader 546: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2024-05-01 16:06:49,275 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 409: shuffle: False - shard 0 of 1
2024-05-01 16:06:49,279 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 479: sampling 1 datasets with weights:
2024-05-01 16:06:49,279 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 481: source: 0 weight: 1.000000
2024-05-01 16:06:49,456 [TAO Toolkit] [INFO] __main__ 591: Found 230 samples in validation set
2024-05-01 16:06:49,456 [TAO Toolkit] [INFO] root 2102: Rasterizing tensors.
2024-05-01 16:06:49,612 [TAO Toolkit] [INFO] root 2102: Tensors rasterized.
2024-05-01 16:06:49,862 [TAO Toolkit] [INFO] root 2102: Validation graph built.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/tfhooks/validation_hook.py:58: The name tf.summary.FileWriterCache is deprecated. Please use tf.compat.v1.summary.FileWriterCache instead.

2024-05-01 16:06:49,863 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/tfhooks/validation_hook.py:58: The name tf.summary.FileWriterCache is deprecated. Please use tf.compat.v1.summary.FileWriterCache instead.

2024-05-01 16:06:50,864 [TAO Toolkit] [INFO] root 2102: Running training loop.
2024-05-01 16:06:50,864 [TAO Toolkit] [INFO] __main__ 135: Checkpoint interval: 10
2024-05-01 16:06:50,864 [TAO Toolkit] [INFO] __main__ 175: Scalars logged at every 7 steps
2024-05-01 16:06:50,864 [TAO Toolkit] [INFO] __main__ 180: Images logged at every 1770 steps
INFO:tensorflow:Create CheckpointSaverHook.
2024-05-01 16:06:50,866 [TAO Toolkit] [INFO] tensorflow 541: Create CheckpointSaverHook.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/utilities.py:154: The name tf.train.SingularMonitoredSession is deprecated. Please use tf.compat.v1.train.SingularMonitoredSession instead.

2024-05-01 16:06:50,867 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/utilities.py:154: The name tf.train.SingularMonitoredSession is deprecated. Please use tf.compat.v1.train.SingularMonitoredSession instead.

INFO:tensorflow:Graph was finalized.
2024-05-01 16:06:52,270 [TAO Toolkit] [INFO] tensorflow 240: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2024-05-01 16:06:53,534 [TAO Toolkit] [INFO] tensorflow 500: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2024-05-01 16:06:54,039 [TAO Toolkit] [INFO] tensorflow 502: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2024-05-01 16:06:59,036 [TAO Toolkit] [INFO] tensorflow 81: Saving checkpoints for step-0.
2024-05-01 16:07:12,464 [TAO Toolkit] [INFO] root 2102: Saving trained model.
2024-05-01 16:07:12,556 [TAO Toolkit] [INFO] root 2102: Model saved.
2024-05-01 16:07:12,669 [TAO Toolkit] [INFO] root 2102: 2 root error(s) found.
  (0) Invalid argument: bboxes class ID out of range [0, 3[, got-1
	 [[node BboxRasterizer_2/RasterizeBbox (defined at /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
	 [[resnet18_nopool_bn_detectnet_v2/block_1b_bn_2/AssignMovingAvg_1/_4107]]
  (1) Invalid argument: bboxes class ID out of range [0, 3[, got-1
	 [[node BboxRasterizer_2/RasterizeBbox (defined at /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'BboxRasterizer_2/RasterizeBbox':
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1046, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
    return_args = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1024, in main
    run_experiment(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 887, in run_experiment
    train_gridbox(results_dir, experiment_spec, output_model_file_name, input_model_file_name,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 721, in train_gridbox
    build_training_graph(experiment_spec,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 544, in build_training_graph
    rasterize_tensors(gridbox_model, loss_mask_label_filter, bbox_rasterizer,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 435, in rasterize_tensors
    gridbox_model.generate_ground_truth_tensors(bbox_rasterizer=bbox_rasterizer,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/model/detectnet_model.py", line 675, in generate_ground_truth_tensors
    self.objective_set.generate_ground_truth_tensors(bbox_rasterizer, batch_labels)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/objectives/objective_set.py", line 286, in generate_ground_truth_tensors
    bbox_rasterizer.rasterize_labels(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/rasterizers/bbox_rasterizer.py", line 462, in rasterize_labels
    self._rasterizer(num_images=num_images,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/processors/processors.py", line 247, in __call__
    return self.call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/processors/bbox_rasterizer.py", line 233, in call
    output_image = op.rasterize_bbox(
  File "<string>", line 150, in rasterize_bbox
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1349, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1441, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: bboxes class ID out of range [0, 3[, got-1
	 [[{{node BboxRasterizer_2/RasterizeBbox}}]]
	 [[resnet18_nopool_bn_detectnet_v2/block_1b_bn_2/AssignMovingAvg_1/_4107]]
  (1) Invalid argument: bboxes class ID out of range [0, 3[, got-1
	 [[{{node BboxRasterizer_2/RasterizeBbox}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1067, in <module>
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1046, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
    return_args = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1024, in main
    run_experiment(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 887, in run_experiment
    train_gridbox(results_dir, experiment_spec, output_model_file_name, input_model_file_name,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 760, in train_gridbox
    run_training_loop(experiment_spec, results_dir, gridbox_model, hooks, steps_per_epoch,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 234, in run_training_loop
    session.run([gridbox_model.get_train_op()])
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/monitored_session.py", line 750, in run
    return self._sess.run(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
    raise value
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1413, in run
    outputs = _WrappedSession.run(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 955, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1179, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1358, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: bboxes class ID out of range [0, 3[, got-1
	 [[node BboxRasterizer_2/RasterizeBbox (defined at /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
	 [[resnet18_nopool_bn_detectnet_v2/block_1b_bn_2/AssignMovingAvg_1/_4107]]
  (1) Invalid argument: bboxes class ID out of range [0, 3[, got-1
	 [[node BboxRasterizer_2/RasterizeBbox (defined at /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'BboxRasterizer_2/RasterizeBbox':
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1046, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
    return_args = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1024, in main
    run_experiment(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 887, in run_experiment
    train_gridbox(results_dir, experiment_spec, output_model_file_name, input_model_file_name,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 721, in train_gridbox
    build_training_graph(experiment_spec,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 544, in build_training_graph
    rasterize_tensors(gridbox_model, loss_mask_label_filter, bbox_rasterizer,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 435, in rasterize_tensors
    gridbox_model.generate_ground_truth_tensors(bbox_rasterizer=bbox_rasterizer,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/model/detectnet_model.py", line 675, in generate_ground_truth_tensors
    self.objective_set.generate_ground_truth_tensors(bbox_rasterizer, batch_labels)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/objectives/objective_set.py", line 286, in generate_ground_truth_tensors
    bbox_rasterizer.rasterize_labels(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/rasterizers/bbox_rasterizer.py", line 462, in rasterize_labels
    self._rasterizer(num_images=num_images,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/processors/processors.py", line 247, in __call__
    return self.call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/processors/bbox_rasterizer.py", line 233, in call
    output_image = op.rasterize_bbox(
  File "<string>", line 150, in rasterize_bbox
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Execution status: FAIL

What's next?
  Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug c844af1bd2b0fed2bc34da64fbe8d508d56438dbd7a17aa33f62f4a9dde8ce59
  Learn more at https://docs.docker.com/go/debug-cli/
2024-05-01 11:07:19,003 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Some more info:

  • I checked all the images and their size is 640*640.
  • This Detectron_v2 notebook and corresponding config files were able to train with the default kitti dataset that refered in the notebook with no issues.
  • I also wanted to check if there is any problem with the dataset and tried to train a
    SSD model using SSD.ipynb and I was able to train the model with great mAP.

I appritiate your help!

In your training spec file, there are 6 classes in “dataset_config”. But there are only 3 classes in “postprocessing_config”, “evaluation_config” and “cost_function_config” ,etc. Could you modify and retry?

That solved the problem. Thanks.