• Hardware (T4/V100/Xavier/Nano/etc) : A30 GPU
• Network : Type Detectnet_v2
tao docker nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
I am training detectnet-v2
to make comparison of accuracy with efficientdet-tf1 model.
I have produced tfrecords file and they are in tfrecords/train and tfrecords/val
root@d624668eba5a:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/detectnet_v2/data/tfrecords/train# ls -la
total 616
drwxr-xr-x 2 root root 4096 Nov 2 03:44 .
drwxr-xr-x 4 root root 4096 Nov 2 03:44 ..
-rw-r--r-- 1 root root 25407 Nov 2 03:42 data-fold-001-of-002-shard-00000-of-00020
-rw-r--r-- 1 root root 25942 Nov 2 03:42 data-fold-001-of-002-shard-00001-of-00020
-rw-r--r-- 1 root root 24400 Nov 2 03:42 data-fold-001-of-002-shard-00002-of-00020
-rw-r--r-- 1 root root 25985 Nov 2 03:42 data-fold-001-of-002-shard-00003-of-00020
-rw-r--r-- 1 root root 28174 Nov 2 03:42 data-fold-001-of-002-shard-00004-of-00020
-rw-r--r-- 1 root root 28622 Nov 2 03:42 data-fold-001-of-002-shard-00005-of-00020
-rw-r--r-- 1 root root 28890 Nov 2 03:42 data-fold-001-of-002-shard-00006-of-00020
-rw-r--r-- 1 root root 28101 Nov 2 03:42 data-fold-001-of-002-shard-00007-of-00020
-rw-r--r-- 1 root root 30769 Nov 2 03:42 data-fold-001-of-002-shard-00008-of-00020
-rw-r--r-- 1 root root 33351 Nov 2 03:42 data-fold-001-of-002-shard-00009-of-00020
-rw-r--r-- 1 root root 33182 Nov 2 03:42 data-fold-001-of-002-shard-00010-of-00020
-rw-r--r-- 1 root root 27849 Nov 2 03:42 data-fold-001-of-002-shard-00011-of-00020
-rw-r--r-- 1 root root 29245 Nov 2 03:42 data-fold-001-of-002-shard-00012-of-00020
-rw-r--r-- 1 root root 29840 Nov 2 03:42 data-fold-001-of-002-shard-00013-of-00020
-rw-r--r-- 1 root root 27859 Nov 2 03:42 data-fold-001-of-002-shard-00014-of-00020
-rw-r--r-- 1 root root 29551 Nov 2 03:42 data-fold-001-of-002-shard-00015-of-00020
-rw-r--r-- 1 root root 25996 Nov 2 03:42 data-fold-001-of-002-shard-00016-of-00020
-rw-r--r-- 1 root root 25449 Nov 2 03:42 data-fold-001-of-002-shard-00017-of-00020
-rw-r--r-- 1 root root 30043 Nov 2 03:42 data-fold-001-of-002-shard-00018-of-00020
-rw-r--r-- 1 root root 40830 Nov 2 03:42 data-fold-001-of-002-shard-00019-of-00020
root@d624668eba5a:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/detectnet_v2/data/tfrecords/val# ls -la
total 132
drwxr-xr-x 2 root root 4096 Nov 2 03:44 .
drwxr-xr-x 4 root root 4096 Nov 2 03:44 ..
-rw-r--r-- 1 root root 9818 Nov 2 03:42 data-fold-000-of-002-shard-00000-of-00010
-rw-r--r-- 1 root root 9812 Nov 2 03:42 data-fold-000-of-002-shard-00001-of-00010
-rw-r--r-- 1 root root 10773 Nov 2 03:42 data-fold-000-of-002-shard-00002-of-00010
-rw-r--r-- 1 root root 10840 Nov 2 03:42 data-fold-000-of-002-shard-00003-of-00010
-rw-r--r-- 1 root root 10980 Nov 2 03:42 data-fold-000-of-002-shard-00004-of-00010
-rw-r--r-- 1 root root 11934 Nov 2 03:42 data-fold-000-of-002-shard-00005-of-00010
-rw-r--r-- 1 root root 12897 Nov 2 03:42 data-fold-000-of-002-shard-00006-of-00010
-rw-r--r-- 1 root root 11335 Nov 2 03:42 data-fold-000-of-002-shard-00007-of-00010
-rw-r--r-- 1 root root 11448 Nov 2 03:42 data-fold-000-of-002-shard-00008-of-00010
-rw-r--r-- 1 root root 10885 Nov 2 03:42 data-fold-000-of-002-shard-00009-of-00010
The configuration for producing tfrecords is
coco_config {
root_directory_path: "/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/detectnet_v2/data"
img_dir_names: ["val", "train"]
annotation_files: ["annotations/val.json", "annotations/train.json"]
num_partitions: 2
num_shards: [10,20]
}
image_directory_path: "/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/detectnet_v2/data"
target_class_mapping {
key: "nohelmet"
value: "nohelmet"
}
target_class_mapping {
key: "withhelmet"
value: "withhelmet"
}
This is dataset convertion command
detectnet_v2 dataset_convert -d specs/coco_dataset_convert.txt -o data/tfrecords/data
Training spec file is
detectnet_v2_train_resnet18_kitti.txt (5.3 KB)
Training command is
detectnet_v2 train -k nvidia_tao \
-r /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/detectnet_v2/results/unpruned \
-e /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt \
-n helmet \
--gpus 1
Error logs are
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/training_proto_utilities.py:102: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
2023-11-02 04:04:30,390 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/training_proto_utilities.py:102: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py:718: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.
2023-11-02 04:04:30,412 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py:718: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.
2023-11-02 04:04:30,413 [TAO Toolkit] [INFO] root 2102: Building training graph.
2023-11-02 04:04:30,415 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 175: Serial augmentation enabled = False
2023-11-02 04:04:30,415 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 177: Pseudo sharding enabled = False
2023-11-02 04:04:30,416 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 269: Max Image Dimensions (all sources): (0, 0)
2023-11-02 04:04:30,416 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 380: number of cpus: 104, io threads: 208, compute threads: 104, buffered batches: 4
2023-11-02 04:04:30,416 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 387: total dataset size 798, number of sources: 1, batch size per gpu: 4, steps: 200
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
2023-11-02 04:04:30,468 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
2023-11-02 04:04:33,279 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataloader.default_dataloader 546: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2023-11-02 04:04:36,069 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 409: shuffle: True - shard 0 of 1
2023-11-02 04:04:36,073 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 479: sampling 1 datasets with weights:
2023-11-02 04:04:36,073 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 481: source: 0 weight: 1.000000
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
2023-11-02 04:04:36,845 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
2023-11-02 04:04:37,583 [TAO Toolkit] [INFO] __main__ 536: Found 798 samples in training set
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/visualizer/tensorboard_visualizer.py:92: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.
2023-11-02 04:04:37,585 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/visualizer/tensorboard_visualizer.py:92: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.
2023-11-02 04:04:37,587 [TAO Toolkit] [INFO] root 2102: Rasterizing tensors.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/rasterizers/bbox_rasterizer.py:348: The name tf.bincount is deprecated. Please use tf.math.bincount instead.
2023-11-02 04:04:37,669 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/rasterizers/bbox_rasterizer.py:348: The name tf.bincount is deprecated. Please use tf.math.bincount instead.
2023-11-02 04:04:37,761 [TAO Toolkit] [INFO] root 2102: Tensors rasterized.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/training_proto_utilities.py:49: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
2023-11-02 04:04:37,761 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/training/training_proto_utilities.py:49: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_functions.py:29: The name tf.log is deprecated. Please use tf.math.log instead.
2023-11-02 04:04:38,057 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_functions.py:29: The name tf.log is deprecated. Please use tf.math.log instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:250: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.
2023-11-02 04:04:38,129 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:250: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/visualizer/tensorboard_visualizer.py:99: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.
2023-11-02 04:04:41,522 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/visualizer/tensorboard_visualizer.py:99: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.
2023-11-02 04:04:41,919 [TAO Toolkit] [INFO] root 2102: Training graph built.
2023-11-02 04:04:41,919 [TAO Toolkit] [INFO] root 2102: Building validation graph.
2023-11-02 04:04:41,920 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 175: Serial augmentation enabled = False
2023-11-02 04:04:41,920 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 177: Pseudo sharding enabled = False
2023-11-02 04:04:41,920 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 269: Max Image Dimensions (all sources): (0, 0)
2023-11-02 04:04:41,920 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 380: number of cpus: 104, io threads: 208, compute threads: 104, buffered batches: 4
2023-11-02 04:04:41,920 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 387: total dataset size 0, number of sources: 1, batch size per gpu: 4, steps: 0
2023-11-02 04:04:41,924 [TAO Toolkit] [WARNING] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 402: skipping empty datasource
2023-11-02 04:04:41,924 [TAO Toolkit] [INFO] nvidia_tao_tf1.blocks.multi_source_loader.data_loader 479: sampling 0 datasets with weights:
2023-11-02 04:04:41,932 [TAO Toolkit] [INFO] root 2102: list index out of range
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1067, in <module>
raise e
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1046, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
return_args = fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 1024, in main
run_experiment(
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 887, in run_experiment
train_gridbox(results_dir, experiment_spec, output_model_file_name, input_model_file_name,
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 731, in train_gridbox
evaluator = build_validation_graph(experiment_spec,
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py", line 589, in build_validation_graph
dataloader.get_dataset_tensors(batch_size, training=False, enable_augmentation=False)
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/dataloader/drivenet_dataloader.py", line 670, in get_dataset_tensors
sequence_example = data_loader()
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/blocks/trainer/data_loader_interface.py", line 89, in __call__
return self.call()
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/blocks/multi_source_loader/data_loader.py", line 482, in call
combined = tf.data.experimental.sample_from_datasets(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/data/experimental/ops/interleave_ops.py", line 230, in sample_from_datasets_v1
sample_from_datasets_v2(datasets, weights, seed))
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/data/experimental/ops/interleave_ops.py", line 224, in sample_from_datasets_v2
return _DirectedInterleaveDataset(selector_input, datasets)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/data/experimental/ops/interleave_ops.py", line 106, in __init__
first_output_types = dataset_ops.get_legacy_output_types(data_inputs[0])
IndexError: list index out of range
Execution status: FAIL
Why I have error in training?