Error when training YOLOV3 with TAO

d.cremona · May 6, 2022, 3:01pm

Hi all, I’m trying to train a yolov3 model using TAO toolkit and my custom data.

I successfully created a KITTI version of my data and then converted it to tfrecords using the tao command.
But when I try to execute the training command I get this error log:

in particular I don’t know where to look to debug the AssertionError AssertionError: No files match pattern /root/datasets/ at the end of the log.

Does anyone have suggestions?

Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/train.py:40: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2022-05-06 14:43:58,071 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/train.py:40: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/train.py:43: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-05-06 14:43:58,073 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/train.py:43: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2022-05-06 14:43:58,490 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2022-05-06 14:43:58,506 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2022-05-06 14:43:58,508 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2022-05-06 14:43:58,531 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

2022-05-06 14:43:59,227 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/train.py", line 110, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 528, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 516, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/train.py", line 106, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/train.py", line 58, in run_experiment
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/models/utils.py", line 30, in build_training_pipeline
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/data_loader.py", line 67, in __init__
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/yolo_v3_data_loader.py", line 790, in build_dataloader
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/yolo_v3_data_loader.py", line 750, in build_data_source_lists
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataloader/build_dataloader.py", line 59, in _pattern_to_files
AssertionError: No files match pattern /root/datasets/.

• Hardware (T4/V100/Xavier/Nano/etc): RTX2080
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): Yolo_v3
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)

Configuration of the TAO Toolkit Instance

dockers: 		
	nvidia/tao/tao-toolkit-tf: 			
		v3.21.11-tf1.15.5-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. augment
				2. bpnet
				3. classification
				4. dssd
				5. emotionnet
				6. efficientdet
				7. fpenet
				8. gazenet
				9. gesturenet
				10. heartratenet
				11. lprnet
				12. mask_rcnn
				13. multitask_classification
				14. retinanet
				15. ssd
				16. unet
				17. yolo_v3
				18. yolo_v4
				19. yolo_v4_tiny
				20. converter
		v3.21.11-tf1.15.4-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. detectnet_v2
				2. faster_rcnn
	nvidia/tao/tao-toolkit-pyt: 			
		v3.21.11-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. speech_to_text
				2. speech_to_text_citrinet
				3. text_classification
				4. question_answering
				5. token_classification
				6. intent_slot_classification
				7. punctuation_and_capitalization
				8. action_recognition
		v3.22.02-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. spectro_gen
				2. vocoder
	nvidia/tao/tao-toolkit-lm: 			
		v3.21.08-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. n_gram
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022

• Training spec file(If have, please share here)

random_seed: 42
yolov3_config {
  big_anchor_shape: "[(514.17, 78.47), (220.24, 217.41), (642.07, 120.47)]"
  mid_anchor_shape: "[(294.79, 47.24), (169.48, 112.35), (156.00, 184.47)]"
  small_anchor_shape: "[(30.97, 30.12), (112.12, 67.47), (108.68, 127.88)]"
  matching_neutral_box_iou: 0.7
  arch: "resnet"
  nlayers: 18
  arch_conv_blocks: 2
  loss_loc_weight: 0.8
  loss_neg_obj_weights: 100.0
  loss_class_weights: 1.0
  freeze_bn: false
  #freeze_blocks: 0
  force_relu: false
}
training_config {
  batch_size_per_gpu: 8
  num_epochs: 80
  enable_qat: false
  checkpoint_interval: 10
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 1e-6
    max_learning_rate: 1e-4
    soft_start: 0.1
    annealing: 0.5
    }
  }
  regularizer {
    type: L1
    weight: 3e-5
  }
  optimizer {
    adam {
      epsilon: 1e-7
      beta1: 0.9
      beta2: 0.999
      amsgrad: false
    }
  }
  pretrain_model_path: "/workspace/tao-experiments/tao_yolo_v3_01/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5"
}
eval_config {
  average_precision_mode: SAMPLE
  batch_size: 8
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.001
  clustering_iou_threshold: 0.5
  top_k: 200
  force_on_cpu: True
}
augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure:1.5
  vertical_flip:0
  horizontal_flip: 0.5
  jitter: 0.3
  output_width: 1248
  output_height: 384
  output_channel: 3
  randomize_input_shape_period: 0
}
dataset_config {
  data_sources: {
      tfrecords_path: "/workspace/tao-experiments/data/tfrecords/insulators_test_dataset/train/*"
      image_directory_path: "/workspace/tao-experiments/data/insulators_test_dataset/train/images/"
  }
  include_difficult_in_training: true
  image_extension: "JPG"
  target_class_mapping {
      key: "insulator"
      value: "insulator"
  }
  validation_data_sources: {
    label_directory_path: "/workspace/tao-experiments/data/insulators_test_dataset/test/labels/"
    image_directory_path: "/workspace/tao-experiments/data/insulators_test_dataset/test/images/"
  }
}

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

!tao yolo_v3 train -e $TAO_SPECS_DIR/yolo_v3_train_resnet18_tfrecord.txt \
                   -r $TAO_EXPERIMENT_DIR/experiment_dir_unpruned \
                   -k $KEY \
                   --log_file $TAO_EXPERIMENT_DIR/experiment_dir_unpruned/train_log.txt \
                   --gpus 1

Morganh · May 6, 2022, 3:18pm

To debug, you can run below commands in terminal instead of jupyter notebook.
$ tao yolo_v3 run /bin/bash

Then,
# yolo_v3 train xxx

More, please double check the data_sources. You can also try below format.

label_directory_path: “/workspace/tao-experiments/data/insulators_test_dataset/test/labels/”
image_directory_path: “/workspace/tao-experiments/data/insulators_test_dataset/test/images/”

d.cremona · May 6, 2022, 3:24pm

Blockquote
More, please double check the data_sources. You can also try below format.

Can I use that format also for training set?

d.cremona · May 6, 2022, 3:28pm

Whoa! Changing the format to:

label_directory_path: “/workspace/tao-experiments/data/insulators_test_dataset/test/labels/”
image_directory_path: “/workspace/tao-experiments/data/insulators_test_dataset/test/images/”

did the job, thank you!!

Morganh · May 6, 2022, 3:41pm

Yes.

system · May 20, 2022, 3:41pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO yoloV4 cannot train from checkpoint TAO Toolkit	8	394	August 5, 2022
"YOLOv3DatasetConfig" has no field named "z" TAO Toolkit	2	503	December 4, 2022
Unable to train yolov4 with Tao succesfully TAO Toolkit	6	495	April 28, 2023
Yolo V4 Training Error TAO Toolkit	3	643	August 2, 2022
TAO Toolkit Training Error TAO Toolkit	2	710	August 2, 2022
TAO Yolo v4 using custom data creates some blank TFRecords TAO Toolkit yolo , computer-vision	4	693	December 1, 2022
Yolov3 worklfow or incorrect calibration file for int8 inference TAO Toolkit tensorrt , yolo , deepstream	6	523	July 6, 2023
Tlt-3.0 yolo_v4 pre-trained models TAO Toolkit	4	747	October 5, 2021
Tao Training failing on creating directory on a standard example TAO Toolkit tao	10	725	September 6, 2022
Training custom model using Yolo_v4_tiny TAO Toolkit	13	1546	January 19, 2022

Error when training YOLOV3 with TAO

Related topics