Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) : AWS g4dn instance
• Network Type: retinanet
• TLT Version: v3.21.11
• Training spec file(If have, please share here)
random_seed: 42
retinanet_config {
aspect_ratios_global: "[1.0, 2.0, 0.5]"
scales: "[0.045, 0.09, 0.2, 0.4, 0.55, 0.7]"
two_boxes_for_ar1: false
clip_boxes: false
loss_loc_weight: 0.8
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: "[0.1, 0.1, 0.2, 0.2]"
arch: "resnet"
nlayers: 18
n_kernels: 1
n_anchor_levels: 1
feature_size: 256
freeze_bn: False
freeze_blocks: 0
}
training_config {
enable_qat: False
pretrain_model_path: "/workspace/tao-experiments/retinanet/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5"
batch_size_per_gpu: 8
num_epochs: 100
n_workers: 2
checkpoint_interval: 10
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 4e-5
max_learning_rate: 1.5e-2
soft_start: 0.1
annealing: 0.3
}
}
regularizer {
type: L1
weight: 2e-5
}
optimizer {
sgd {
momentum: 0.9
nesterov: True
}
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
batch_size: 8
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
clustering_iou_threshold: 0.6
top_k: 200
}
augmentation_config {
output_width: 1248
output_height: 384
output_channel: 3
}
dataset_config {
data_sources: {
tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_train*"
}
target_class_mapping {
key: "car"
value: "vehicle"
}
target_class_mapping {
key: "truck"
value: "vehicle"
}
target_class_mapping {
key: "van"
value: "car"
}
target_class_mapping {
key: "bus"
value: "vehicle"
}
target_class_mapping {
key: "person"
value: "person"
}
validation_data_sources: {
image_directory_path: "/workspace/tao-experiments/data/val/image"
label_directory_path: "/workspace/tao-experiments/data/val/label"
}
}
I ran into an issue while using the TAO toolkit to do transfer learning with custom dataset. The error shows the box is out of bounds. The model I used is retinanet, and the dataset is in KITTI format, but there was no issue if I use yolov3 model.
The log is as follows:
2022-06-27 17:34:51,453 [INFO] root: Registry: ['nvcr.io']
2022-06-27 17:34:51,523 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-06-27 17:34:51,532 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:61: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-06-27 17:34:58,037 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:61: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:64: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-06-27 17:34:58,038 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:64: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-06-27 17:34:58,527 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/retinanet/specs/retinanet_train_resnet18_kitti.txt.
2022-06-27 17:34:58,529 [INFO] iva.retinanet.utils.spec_loader: Merging specification from /workspace/tao-experiments/retinanet/specs/retinanet_train_resnet18_kitti.txt
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2022-06-27 17:34:58,531 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2022-06-27 17:34:58,532 [INFO] __main__: Using DALI dataloader...
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
2022-06-27 17:34:58,675 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
2022-06-27 17:34:58,691 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.
2022-06-27 17:34:59,159 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.
2022-06-27 17:34:59,183 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.
2022-06-27 17:34:59,370 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
2022-06-27 17:34:59,548 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
2022-06-27 17:35:00,912 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
2022-06-27 17:35:01,104 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
2022-06-27 17:35:01,104 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
2022-06-27 17:35:01,105 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
2022-06-27 17:35:01,624 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
2022-06-27 17:35:02,293 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.
2022-06-27 17:35:02,297 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.
2022-06-27 17:35:02,955 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.
2022-06-27 17:35:03,117 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.
2022-06-27 17:35:04,489 [INFO] iva.retinanet.utils.model_io: Loading model weights...
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input (InputLayer) (8, 3, 384, 1248) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (8, 64, 192, 624) 9408 Input[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (8, 64, 192, 624) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (8, 64, 192, 624) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (8, 64, 96, 312) 36864 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (8, 64, 96, 312) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
block_1a_relu_1 (Activation) (8, 64, 96, 312) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (8, 64, 96, 312) 36864 block_1a_relu_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (8, 64, 96, 312) 4096 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (8, 64, 96, 312) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (8, 64, 96, 312) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (8, 64, 96, 312) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1a_relu (Activation) (8, 64, 96, 312) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (8, 64, 96, 312) 36864 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (8, 64, 96, 312) 256 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
block_1b_relu_1 (Activation) (8, 64, 96, 312) 0 block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (8, 64, 96, 312) 36864 block_1b_relu_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (8, 64, 96, 312) 4096 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (8, 64, 96, 312) 256 block_1b_conv_2[0][0]
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (8, 64, 96, 312) 256 block_1b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_2 (Add) (8, 64, 96, 312) 0 block_1b_bn_2[0][0]
block_1b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1b_relu (Activation) (8, 64, 96, 312) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (8, 128, 48, 156) 73728 block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (8, 128, 48, 156) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
block_2a_relu_1 (Activation) (8, 128, 48, 156) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (8, 128, 48, 156) 147456 block_2a_relu_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (8, 128, 48, 156) 8192 block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (8, 128, 48, 156) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (8, 128, 48, 156) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (8, 128, 48, 156) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2a_relu (Activation) (8, 128, 48, 156) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (8, 128, 48, 156) 147456 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (8, 128, 48, 156) 512 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
block_2b_relu_1 (Activation) (8, 128, 48, 156) 0 block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (8, 128, 48, 156) 147456 block_2b_relu_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (8, 128, 48, 156) 16384 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (8, 128, 48, 156) 512 block_2b_conv_2[0][0]
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (8, 128, 48, 156) 512 block_2b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_4 (Add) (8, 128, 48, 156) 0 block_2b_bn_2[0][0]
block_2b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2b_relu (Activation) (8, 128, 48, 156) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (8, 256, 24, 78) 294912 block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (8, 256, 24, 78) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
block_3a_relu_1 (Activation) (8, 256, 24, 78) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (8, 256, 24, 78) 589824 block_3a_relu_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (8, 256, 24, 78) 32768 block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (8, 256, 24, 78) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (8, 256, 24, 78) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add) (8, 256, 24, 78) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3a_relu (Activation) (8, 256, 24, 78) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (8, 256, 24, 78) 589824 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (8, 256, 24, 78) 1024 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
block_3b_relu_1 (Activation) (8, 256, 24, 78) 0 block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (8, 256, 24, 78) 589824 block_3b_relu_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (8, 256, 24, 78) 65536 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (8, 256, 24, 78) 1024 block_3b_conv_2[0][0]
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (8, 256, 24, 78) 1024 block_3b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_6 (Add) (8, 256, 24, 78) 0 block_3b_bn_2[0][0]
block_3b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3b_relu (Activation) (8, 256, 24, 78) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (8, 512, 24, 78) 1179648 block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (8, 512, 24, 78) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
block_4a_relu_1 (Activation) (8, 512, 24, 78) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (8, 512, 24, 78) 2359296 block_4a_relu_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (8, 512, 24, 78) 131072 block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (8, 512, 24, 78) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (8, 512, 24, 78) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add) (8, 512, 24, 78) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4a_relu (Activation) (8, 512, 24, 78) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (8, 512, 24, 78) 2359296 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (8, 512, 24, 78) 2048 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
block_4b_relu_1 (Activation) (8, 512, 24, 78) 0 block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (8, 512, 24, 78) 2359296 block_4b_relu_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (8, 512, 24, 78) 262144 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (8, 512, 24, 78) 2048 block_4b_conv_2[0][0]
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (8, 512, 24, 78) 2048 block_4b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_8 (Add) (8, 512, 24, 78) 0 block_4b_bn_2[0][0]
block_4b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4b_relu (Activation) (8, 512, 24, 78) 0 add_8[0][0]
__________________________________________________________________________________________________
expand_conv1 (Conv2D) (8, 256, 12, 39) 1179904 block_4b_relu[0][0]
__________________________________________________________________________________________________
expand1_relu (ReLU) (8, 256, 12, 39) 0 expand_conv1[0][0]
__________________________________________________________________________________________________
C5_reduced (Conv2D) (8, 256, 12, 39) 65792 expand1_relu[0][0]
__________________________________________________________________________________________________
P5_upsampled (UpSampling2D) (8, 256, 24, 78) 0 C5_reduced[0][0]
__________________________________________________________________________________________________
C4_reduced (Conv2D) (8, 256, 24, 78) 131328 block_4b_relu[0][0]
__________________________________________________________________________________________________
P4_merged (Add) (8, 256, 24, 78) 0 P5_upsampled[0][0]
C4_reduced[0][0]
__________________________________________________________________________________________________
P4_upsampled (UpSampling2D) (8, 256, 48, 156) 0 P4_merged[0][0]
__________________________________________________________________________________________________
C3_reduced (Conv2D) (8, 256, 48, 156) 33024 block_2b_relu[0][0]
__________________________________________________________________________________________________
P6 (Conv2D) (8, 256, 6, 20) 590080 expand1_relu[0][0]
__________________________________________________________________________________________________
P3_merged (Add) (8, 256, 48, 156) 0 P4_upsampled[0][0]
C3_reduced[0][0]
__________________________________________________________________________________________________
P6_relu (ReLU) (8, 256, 6, 20) 0 P6[0][0]
__________________________________________________________________________________________________
P3 (Conv2D) (8, 256, 48, 156) 590080 P3_merged[0][0]
__________________________________________________________________________________________________
P4 (Conv2D) (8, 256, 24, 78) 590080 P4_merged[0][0]
__________________________________________________________________________________________________
P5 (Conv2D) (8, 256, 12, 39) 590080 C5_reduced[0][0]
__________________________________________________________________________________________________
P7 (Conv2D) (8, 256, 3, 10) 590080 P6_relu[0][0]
__________________________________________________________________________________________________
P3_relu (ReLU) (8, 256, 48, 156) 0 P3[0][0]
__________________________________________________________________________________________________
P4_relu (ReLU) (8, 256, 24, 78) 0 P4[0][0]
__________________________________________________________________________________________________
P5_relu (ReLU) (8, 256, 12, 39) 0 P5[0][0]
__________________________________________________________________________________________________
P7_relu (ReLU) (8, 256, 3, 10) 0 P7[0][0]
__________________________________________________________________________________________________
retinanet_class_subn_0 (Conv2D) multiple 590080 P3_relu[0][0]
P4_relu[0][0]
P5_relu[0][0]
P6_relu[0][0]
P7_relu[0][0]
__________________________________________________________________________________________________
retinanet_conf_regressor (Conv2 multiple 27660 retinanet_class_subn_0[0][0]
retinanet_class_subn_0[1][0]
retinanet_class_subn_0[2][0]
retinanet_class_subn_0[3][0]
retinanet_class_subn_0[4][0]
__________________________________________________________________________________________________
retinanet_loc_subn_0 (Conv2D) multiple 590080 P3_relu[0][0]
P4_relu[0][0]
P5_relu[0][0]
P6_relu[0][0]
P7_relu[0][0]
__________________________________________________________________________________________________
permute_1 (Permute) (8, 48, 156, 12) 0 retinanet_conf_regressor[0][0]
__________________________________________________________________________________________________
permute_3 (Permute) (8, 24, 78, 12) 0 retinanet_conf_regressor[1][0]
__________________________________________________________________________________________________
permute_5 (Permute) (8, 12, 39, 12) 0 retinanet_conf_regressor[2][0]
__________________________________________________________________________________________________
permute_7 (Permute) (8, 6, 20, 12) 0 retinanet_conf_regressor[3][0]
__________________________________________________________________________________________________
permute_9 (Permute) (8, 3, 10, 12) 0 retinanet_conf_regressor[4][0]
__________________________________________________________________________________________________
retinanet_loc_regressor (Conv2D multiple 27660 retinanet_loc_subn_0[0][0]
retinanet_loc_subn_0[1][0]
retinanet_loc_subn_0[2][0]
retinanet_loc_subn_0[3][0]
retinanet_loc_subn_0[4][0]
__________________________________________________________________________________________________
conf_reshape_0 (Reshape) (8, 22464, 1, 4) 0 permute_1[0][0]
__________________________________________________________________________________________________
conf_reshape_1 (Reshape) (8, 5616, 1, 4) 0 permute_3[0][0]
__________________________________________________________________________________________________
conf_reshape_2 (Reshape) (8, 1404, 1, 4) 0 permute_5[0][0]
__________________________________________________________________________________________________
conf_reshape_3 (Reshape) (8, 360, 1, 4) 0 permute_7[0][0]
__________________________________________________________________________________________________
conf_reshape_4 (Reshape) (8, 90, 1, 4) 0 permute_9[0][0]
__________________________________________________________________________________________________
permute_2 (Permute) (8, 48, 156, 12) 0 retinanet_loc_regressor[0][0]
__________________________________________________________________________________________________
permute_4 (Permute) (8, 24, 78, 12) 0 retinanet_loc_regressor[1][0]
__________________________________________________________________________________________________
permute_6 (Permute) (8, 12, 39, 12) 0 retinanet_loc_regressor[2][0]
__________________________________________________________________________________________________
permute_8 (Permute) (8, 6, 20, 12) 0 retinanet_loc_regressor[3][0]
__________________________________________________________________________________________________
permute_10 (Permute) (8, 3, 10, 12) 0 retinanet_loc_regressor[4][0]
__________________________________________________________________________________________________
retinanet_anchor_0 (RetinaAncho (8, 7488, 3, 8) 0 retinanet_loc_regressor[0][0]
__________________________________________________________________________________________________
retinanet_anchor_1 (RetinaAncho (8, 1872, 3, 8) 0 retinanet_loc_regressor[1][0]
__________________________________________________________________________________________________
retinanet_anchor_2 (RetinaAncho (8, 468, 3, 8) 0 retinanet_loc_regressor[2][0]
__________________________________________________________________________________________________
retinanet_anchor_3 (RetinaAncho (8, 120, 3, 8) 0 retinanet_loc_regressor[3][0]
__________________________________________________________________________________________________
retinanet_anchor_4 (RetinaAncho (8, 30, 3, 8) 0 retinanet_loc_regressor[4][0]
__________________________________________________________________________________________________
mbox_conf (Concatenate) (8, 29934, 1, 4) 0 conf_reshape_0[0][0]
conf_reshape_1[0][0]
conf_reshape_2[0][0]
conf_reshape_3[0][0]
conf_reshape_4[0][0]
__________________________________________________________________________________________________
loc_reshape_0 (Reshape) (8, 22464, 1, 4) 0 permute_2[0][0]
__________________________________________________________________________________________________
loc_reshape_1 (Reshape) (8, 5616, 1, 4) 0 permute_4[0][0]
__________________________________________________________________________________________________
loc_reshape_2 (Reshape) (8, 1404, 1, 4) 0 permute_6[0][0]
__________________________________________________________________________________________________
loc_reshape_3 (Reshape) (8, 360, 1, 4) 0 permute_8[0][0]
__________________________________________________________________________________________________
loc_reshape_4 (Reshape) (8, 90, 1, 4) 0 permute_10[0][0]
__________________________________________________________________________________________________
anchor_reshape_0 (Reshape) (8, 22464, 1, 8) 0 retinanet_anchor_0[0][0]
__________________________________________________________________________________________________
anchor_reshape_1 (Reshape) (8, 5616, 1, 8) 0 retinanet_anchor_1[0][0]
__________________________________________________________________________________________________
anchor_reshape_2 (Reshape) (8, 1404, 1, 8) 0 retinanet_anchor_2[0][0]
__________________________________________________________________________________________________
anchor_reshape_3 (Reshape) (8, 360, 1, 8) 0 retinanet_anchor_3[0][0]
__________________________________________________________________________________________________
anchor_reshape_4 (Reshape) (8, 90, 1, 8) 0 retinanet_anchor_4[0][0]
__________________________________________________________________________________________________
mbox_conf_sigmoid (Activation) (8, 29934, 1, 4) 0 mbox_conf[0][0]
__________________________________________________________________________________________________
mbox_loc (Concatenate) (8, 29934, 1, 4) 0 loc_reshape_0[0][0]
loc_reshape_1[0][0]
loc_reshape_2[0][0]
loc_reshape_3[0][0]
loc_reshape_4[0][0]
__________________________________________________________________________________________________
mbox_priorbox (Concatenate) (8, 29934, 1, 8) 0 anchor_reshape_0[0][0]
anchor_reshape_1[0][0]
anchor_reshape_2[0][0]
anchor_reshape_3[0][0]
anchor_reshape_4[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (8, 29934, 1, 16) 0 mbox_conf_sigmoid[0][0]
mbox_loc[0][0]
mbox_priorbox[0][0]
__________________________________________________________________________________________________
retinanet_predictions (Reshape) (8, 29934, 16) 0 concatenate_1[0][0]
==================================================================================================
Total params: 17,138,392
Trainable params: 17,117,336
Non-trainable params: 21,056
__________________________________________________________________________________________________
2022-06-27 17:35:27,124 [INFO] __main__: Number of samples in the training dataset: 1732
2022-06-27 17:35:27,124 [INFO] __main__: Number of samples in the validation dataset: 192
Epoch 1/100
DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]
Current pipeline object is no longer valid.
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 390, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 528, in return_func
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 516, in return_func
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 386, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 308, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 154, in fit_loop
outs = f(ins)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]
Current pipeline object is no longer valid.
[[{{node Dali}}]]
[[cond_6/MultiMatch/ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_int32_Reshape_1/_3675]]
(1) Internal: DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]
Current pipeline object is no longer valid.
[[{{node Dali}}]]
0 successful operations.
0 derived errors ignored.
2022-06-27 17:35:37,475 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.