Retinanet box out of bounds

hxnwjf · June 27, 2022, 6:37pm

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) : AWS g4dn instance
• Network Type: retinanet
• TLT Version: v3.21.11
• Training spec file(If have, please share here)

random_seed: 42
retinanet_config {
  aspect_ratios_global: "[1.0, 2.0, 0.5]"
  scales: "[0.045, 0.09, 0.2, 0.4, 0.55, 0.7]"
  two_boxes_for_ar1: false
  clip_boxes: false
  loss_loc_weight: 0.8
  focal_loss_alpha: 0.25
  focal_loss_gamma: 2.0
  variances: "[0.1, 0.1, 0.2, 0.2]"
  arch: "resnet"
  nlayers: 18
  n_kernels: 1
  n_anchor_levels: 1
  feature_size: 256
  freeze_bn: False
  freeze_blocks: 0
}
training_config {
  enable_qat: False
  pretrain_model_path: "/workspace/tao-experiments/retinanet/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5"
  batch_size_per_gpu: 8
  num_epochs: 100
  n_workers: 2
  checkpoint_interval: 10
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 4e-5
      max_learning_rate: 1.5e-2
      soft_start: 0.1
      annealing: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 2e-5
  }
  optimizer {
    sgd {
      momentum: 0.9
      nesterov: True
    }
  }
}
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  batch_size: 8
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.01
  clustering_iou_threshold: 0.6
  top_k: 200
}
augmentation_config {
    output_width: 1248
    output_height: 384
    output_channel: 3
}
dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_train*"
  }
  target_class_mapping {
      key: "car"
      value: "vehicle"
  }
  target_class_mapping {
      key: "truck"
      value: "vehicle"
  }
  target_class_mapping {
      key: "van"
      value: "car"
  } 
  target_class_mapping {
      key: "bus"
      value: "vehicle"
  }
  target_class_mapping {
      key: "person"
      value: "person"
  }
   validation_data_sources: {
    image_directory_path: "/workspace/tao-experiments/data/val/image"
    label_directory_path: "/workspace/tao-experiments/data/val/label"
  } 
}

I ran into an issue while using the TAO toolkit to do transfer learning with custom dataset. The error shows the box is out of bounds. The model I used is retinanet, and the dataset is in KITTI format, but there was no issue if I use yolov3 model.
The log is as follows:

2022-06-27 17:34:51,453 [INFO] root: Registry: ['nvcr.io']
2022-06-27 17:34:51,523 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-06-27 17:34:51,532 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:61: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2022-06-27 17:34:58,037 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:61: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:64: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-06-27 17:34:58,038 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:64: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-06-27 17:34:58,527 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/retinanet/specs/retinanet_train_resnet18_kitti.txt.
2022-06-27 17:34:58,529 [INFO] iva.retinanet.utils.spec_loader: Merging specification from /workspace/tao-experiments/retinanet/specs/retinanet_train_resnet18_kitti.txt
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2022-06-27 17:34:58,531 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2022-06-27 17:34:58,532 [INFO] __main__: Using DALI dataloader...
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2022-06-27 17:34:58,675 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2022-06-27 17:34:58,691 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.

2022-06-27 17:34:59,159 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

2022-06-27 17:34:59,183 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

2022-06-27 17:34:59,370 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2022-06-27 17:34:59,548 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

2022-06-27 17:35:00,912 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2022-06-27 17:35:01,104 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2022-06-27 17:35:01,104 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2022-06-27 17:35:01,105 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2022-06-27 17:35:01,624 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

2022-06-27 17:35:02,293 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

2022-06-27 17:35:02,297 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

2022-06-27 17:35:02,955 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2022-06-27 17:35:03,117 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2022-06-27 17:35:04,489 [INFO] iva.retinanet.utils.model_io: Loading model weights...
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Input (InputLayer)              (8, 3, 384, 1248)    0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (8, 64, 192, 624)    9408        Input[0][0]                      
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (8, 64, 192, 624)    256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (8, 64, 192, 624)    0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (8, 64, 96, 312)     36864       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (8, 64, 96, 312)     256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (8, 64, 96, 312)     0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (8, 64, 96, 312)     36864       block_1a_relu_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (8, 64, 96, 312)     4096        activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (8, 64, 96, 312)     256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (8, 64, 96, 312)     256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (8, 64, 96, 312)     0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1a_relu (Activation)      (8, 64, 96, 312)     0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (8, 64, 96, 312)     36864       block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (8, 64, 96, 312)     256         block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (8, 64, 96, 312)     0           block_1b_bn_1[0][0]              
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (8, 64, 96, 312)     36864       block_1b_relu_1[0][0]            
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (8, 64, 96, 312)     4096        block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (8, 64, 96, 312)     256         block_1b_conv_2[0][0]            
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (8, 64, 96, 312)     256         block_1b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_2 (Add)                     (8, 64, 96, 312)     0           block_1b_bn_2[0][0]              
                                                                 block_1b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1b_relu (Activation)      (8, 64, 96, 312)     0           add_2[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (8, 128, 48, 156)    73728       block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (8, 128, 48, 156)    512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (8, 128, 48, 156)    0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (8, 128, 48, 156)    147456      block_2a_relu_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (8, 128, 48, 156)    8192        block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (8, 128, 48, 156)    512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (8, 128, 48, 156)    512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_3 (Add)                     (8, 128, 48, 156)    0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2a_relu (Activation)      (8, 128, 48, 156)    0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (8, 128, 48, 156)    147456      block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (8, 128, 48, 156)    512         block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (8, 128, 48, 156)    0           block_2b_bn_1[0][0]              
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (8, 128, 48, 156)    147456      block_2b_relu_1[0][0]            
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (8, 128, 48, 156)    16384       block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (8, 128, 48, 156)    512         block_2b_conv_2[0][0]            
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (8, 128, 48, 156)    512         block_2b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_4 (Add)                     (8, 128, 48, 156)    0           block_2b_bn_2[0][0]              
                                                                 block_2b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2b_relu (Activation)      (8, 128, 48, 156)    0           add_4[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (8, 256, 24, 78)     294912      block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (8, 256, 24, 78)     1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (8, 256, 24, 78)     0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (8, 256, 24, 78)     589824      block_3a_relu_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (8, 256, 24, 78)     32768       block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (8, 256, 24, 78)     1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (8, 256, 24, 78)     1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_5 (Add)                     (8, 256, 24, 78)     0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3a_relu (Activation)      (8, 256, 24, 78)     0           add_5[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (8, 256, 24, 78)     589824      block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (8, 256, 24, 78)     1024        block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (8, 256, 24, 78)     0           block_3b_bn_1[0][0]              
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (8, 256, 24, 78)     589824      block_3b_relu_1[0][0]            
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (8, 256, 24, 78)     65536       block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (8, 256, 24, 78)     1024        block_3b_conv_2[0][0]            
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (8, 256, 24, 78)     1024        block_3b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_6 (Add)                     (8, 256, 24, 78)     0           block_3b_bn_2[0][0]              
                                                                 block_3b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3b_relu (Activation)      (8, 256, 24, 78)     0           add_6[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (8, 512, 24, 78)     1179648     block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (8, 512, 24, 78)     2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (8, 512, 24, 78)     0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (8, 512, 24, 78)     2359296     block_4a_relu_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (8, 512, 24, 78)     131072      block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (8, 512, 24, 78)     2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (8, 512, 24, 78)     2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_7 (Add)                     (8, 512, 24, 78)     0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4a_relu (Activation)      (8, 512, 24, 78)     0           add_7[0][0]                      
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (8, 512, 24, 78)     2359296     block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (8, 512, 24, 78)     2048        block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (8, 512, 24, 78)     0           block_4b_bn_1[0][0]              
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (8, 512, 24, 78)     2359296     block_4b_relu_1[0][0]            
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (8, 512, 24, 78)     262144      block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (8, 512, 24, 78)     2048        block_4b_conv_2[0][0]            
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (8, 512, 24, 78)     2048        block_4b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_8 (Add)                     (8, 512, 24, 78)     0           block_4b_bn_2[0][0]              
                                                                 block_4b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4b_relu (Activation)      (8, 512, 24, 78)     0           add_8[0][0]                      
__________________________________________________________________________________________________
expand_conv1 (Conv2D)           (8, 256, 12, 39)     1179904     block_4b_relu[0][0]              
__________________________________________________________________________________________________
expand1_relu (ReLU)             (8, 256, 12, 39)     0           expand_conv1[0][0]               
__________________________________________________________________________________________________
C5_reduced (Conv2D)             (8, 256, 12, 39)     65792       expand1_relu[0][0]               
__________________________________________________________________________________________________
P5_upsampled (UpSampling2D)     (8, 256, 24, 78)     0           C5_reduced[0][0]                 
__________________________________________________________________________________________________
C4_reduced (Conv2D)             (8, 256, 24, 78)     131328      block_4b_relu[0][0]              
__________________________________________________________________________________________________
P4_merged (Add)                 (8, 256, 24, 78)     0           P5_upsampled[0][0]               
                                                                 C4_reduced[0][0]                 
__________________________________________________________________________________________________
P4_upsampled (UpSampling2D)     (8, 256, 48, 156)    0           P4_merged[0][0]                  
__________________________________________________________________________________________________
C3_reduced (Conv2D)             (8, 256, 48, 156)    33024       block_2b_relu[0][0]              
__________________________________________________________________________________________________
P6 (Conv2D)                     (8, 256, 6, 20)      590080      expand1_relu[0][0]               
__________________________________________________________________________________________________
P3_merged (Add)                 (8, 256, 48, 156)    0           P4_upsampled[0][0]               
                                                                 C3_reduced[0][0]                 
__________________________________________________________________________________________________
P6_relu (ReLU)                  (8, 256, 6, 20)      0           P6[0][0]                         
__________________________________________________________________________________________________
P3 (Conv2D)                     (8, 256, 48, 156)    590080      P3_merged[0][0]                  
__________________________________________________________________________________________________
P4 (Conv2D)                     (8, 256, 24, 78)     590080      P4_merged[0][0]                  
__________________________________________________________________________________________________
P5 (Conv2D)                     (8, 256, 12, 39)     590080      C5_reduced[0][0]                 
__________________________________________________________________________________________________
P7 (Conv2D)                     (8, 256, 3, 10)      590080      P6_relu[0][0]                    
__________________________________________________________________________________________________
P3_relu (ReLU)                  (8, 256, 48, 156)    0           P3[0][0]                         
__________________________________________________________________________________________________
P4_relu (ReLU)                  (8, 256, 24, 78)     0           P4[0][0]                         
__________________________________________________________________________________________________
P5_relu (ReLU)                  (8, 256, 12, 39)     0           P5[0][0]                         
__________________________________________________________________________________________________
P7_relu (ReLU)                  (8, 256, 3, 10)      0           P7[0][0]                         
__________________________________________________________________________________________________
retinanet_class_subn_0 (Conv2D) multiple             590080      P3_relu[0][0]                    
                                                                 P4_relu[0][0]                    
                                                                 P5_relu[0][0]                    
                                                                 P6_relu[0][0]                    
                                                                 P7_relu[0][0]                    
__________________________________________________________________________________________________
retinanet_conf_regressor (Conv2 multiple             27660       retinanet_class_subn_0[0][0]     
                                                                 retinanet_class_subn_0[1][0]     
                                                                 retinanet_class_subn_0[2][0]     
                                                                 retinanet_class_subn_0[3][0]     
                                                                 retinanet_class_subn_0[4][0]     
__________________________________________________________________________________________________
retinanet_loc_subn_0 (Conv2D)   multiple             590080      P3_relu[0][0]                    
                                                                 P4_relu[0][0]                    
                                                                 P5_relu[0][0]                    
                                                                 P6_relu[0][0]                    
                                                                 P7_relu[0][0]                    
__________________________________________________________________________________________________
permute_1 (Permute)             (8, 48, 156, 12)     0           retinanet_conf_regressor[0][0]   
__________________________________________________________________________________________________
permute_3 (Permute)             (8, 24, 78, 12)      0           retinanet_conf_regressor[1][0]   
__________________________________________________________________________________________________
permute_5 (Permute)             (8, 12, 39, 12)      0           retinanet_conf_regressor[2][0]   
__________________________________________________________________________________________________
permute_7 (Permute)             (8, 6, 20, 12)       0           retinanet_conf_regressor[3][0]   
__________________________________________________________________________________________________
permute_9 (Permute)             (8, 3, 10, 12)       0           retinanet_conf_regressor[4][0]   
__________________________________________________________________________________________________
retinanet_loc_regressor (Conv2D multiple             27660       retinanet_loc_subn_0[0][0]       
                                                                 retinanet_loc_subn_0[1][0]       
                                                                 retinanet_loc_subn_0[2][0]       
                                                                 retinanet_loc_subn_0[3][0]       
                                                                 retinanet_loc_subn_0[4][0]       
__________________________________________________________________________________________________
conf_reshape_0 (Reshape)        (8, 22464, 1, 4)     0           permute_1[0][0]                  
__________________________________________________________________________________________________
conf_reshape_1 (Reshape)        (8, 5616, 1, 4)      0           permute_3[0][0]                  
__________________________________________________________________________________________________
conf_reshape_2 (Reshape)        (8, 1404, 1, 4)      0           permute_5[0][0]                  
__________________________________________________________________________________________________
conf_reshape_3 (Reshape)        (8, 360, 1, 4)       0           permute_7[0][0]                  
__________________________________________________________________________________________________
conf_reshape_4 (Reshape)        (8, 90, 1, 4)        0           permute_9[0][0]                  
__________________________________________________________________________________________________
permute_2 (Permute)             (8, 48, 156, 12)     0           retinanet_loc_regressor[0][0]    
__________________________________________________________________________________________________
permute_4 (Permute)             (8, 24, 78, 12)      0           retinanet_loc_regressor[1][0]    
__________________________________________________________________________________________________
permute_6 (Permute)             (8, 12, 39, 12)      0           retinanet_loc_regressor[2][0]    
__________________________________________________________________________________________________
permute_8 (Permute)             (8, 6, 20, 12)       0           retinanet_loc_regressor[3][0]    
__________________________________________________________________________________________________
permute_10 (Permute)            (8, 3, 10, 12)       0           retinanet_loc_regressor[4][0]    
__________________________________________________________________________________________________
retinanet_anchor_0 (RetinaAncho (8, 7488, 3, 8)      0           retinanet_loc_regressor[0][0]    
__________________________________________________________________________________________________
retinanet_anchor_1 (RetinaAncho (8, 1872, 3, 8)      0           retinanet_loc_regressor[1][0]    
__________________________________________________________________________________________________
retinanet_anchor_2 (RetinaAncho (8, 468, 3, 8)       0           retinanet_loc_regressor[2][0]    
__________________________________________________________________________________________________
retinanet_anchor_3 (RetinaAncho (8, 120, 3, 8)       0           retinanet_loc_regressor[3][0]    
__________________________________________________________________________________________________
retinanet_anchor_4 (RetinaAncho (8, 30, 3, 8)        0           retinanet_loc_regressor[4][0]    
__________________________________________________________________________________________________
mbox_conf (Concatenate)         (8, 29934, 1, 4)     0           conf_reshape_0[0][0]             
                                                                 conf_reshape_1[0][0]             
                                                                 conf_reshape_2[0][0]             
                                                                 conf_reshape_3[0][0]             
                                                                 conf_reshape_4[0][0]             
__________________________________________________________________________________________________
loc_reshape_0 (Reshape)         (8, 22464, 1, 4)     0           permute_2[0][0]                  
__________________________________________________________________________________________________
loc_reshape_1 (Reshape)         (8, 5616, 1, 4)      0           permute_4[0][0]                  
__________________________________________________________________________________________________
loc_reshape_2 (Reshape)         (8, 1404, 1, 4)      0           permute_6[0][0]                  
__________________________________________________________________________________________________
loc_reshape_3 (Reshape)         (8, 360, 1, 4)       0           permute_8[0][0]                  
__________________________________________________________________________________________________
loc_reshape_4 (Reshape)         (8, 90, 1, 4)        0           permute_10[0][0]                 
__________________________________________________________________________________________________
anchor_reshape_0 (Reshape)      (8, 22464, 1, 8)     0           retinanet_anchor_0[0][0]         
__________________________________________________________________________________________________
anchor_reshape_1 (Reshape)      (8, 5616, 1, 8)      0           retinanet_anchor_1[0][0]         
__________________________________________________________________________________________________
anchor_reshape_2 (Reshape)      (8, 1404, 1, 8)      0           retinanet_anchor_2[0][0]         
__________________________________________________________________________________________________
anchor_reshape_3 (Reshape)      (8, 360, 1, 8)       0           retinanet_anchor_3[0][0]         
__________________________________________________________________________________________________
anchor_reshape_4 (Reshape)      (8, 90, 1, 8)        0           retinanet_anchor_4[0][0]         
__________________________________________________________________________________________________
mbox_conf_sigmoid (Activation)  (8, 29934, 1, 4)     0           mbox_conf[0][0]                  
__________________________________________________________________________________________________
mbox_loc (Concatenate)          (8, 29934, 1, 4)     0           loc_reshape_0[0][0]              
                                                                 loc_reshape_1[0][0]              
                                                                 loc_reshape_2[0][0]              
                                                                 loc_reshape_3[0][0]              
                                                                 loc_reshape_4[0][0]              
__________________________________________________________________________________________________
mbox_priorbox (Concatenate)     (8, 29934, 1, 8)     0           anchor_reshape_0[0][0]           
                                                                 anchor_reshape_1[0][0]           
                                                                 anchor_reshape_2[0][0]           
                                                                 anchor_reshape_3[0][0]           
                                                                 anchor_reshape_4[0][0]           
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (8, 29934, 1, 16)    0           mbox_conf_sigmoid[0][0]          
                                                                 mbox_loc[0][0]                   
                                                                 mbox_priorbox[0][0]              
__________________________________________________________________________________________________
retinanet_predictions (Reshape) (8, 29934, 16)       0           concatenate_1[0][0]              
==================================================================================================
Total params: 17,138,392
Trainable params: 17,117,336
Non-trainable params: 21,056
__________________________________________________________________________________________________
2022-06-27 17:35:27,124 [INFO] __main__: Number of samples in the training dataset:	  1732
2022-06-27 17:35:27,124 [INFO] __main__: Number of samples in the validation dataset:	   192
Epoch 1/100
DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]

Current pipeline object is no longer valid.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 390, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 528, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 516, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 386, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 308, in run_experiment
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 154, in fit_loop
    outs = f(ins)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]

Current pipeline object is no longer valid.
	 [[{{node Dali}}]]
	 [[cond_6/MultiMatch/ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_int32_Reshape_1/_3675]]
  (1) Internal: DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]

Current pipeline object is no longer valid.
	 [[{{node Dali}}]]
0 successful operations.
0 derived errors ignored.
2022-06-27 17:35:37,475 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · June 28, 2022, 2:07am

Could you please try sequence data format ?
i.e.,
image_directory_path: “/workspace/tao-experiments/data/train/image”
label_directory_path: “/workspace/tao-experiments/data/train/label”

hxnwjf · June 28, 2022, 6:20pm

It is working now. Thank you so much.
Why using the path to the tfrecords as the data source causes a problem?

Morganh · June 29, 2022, 12:29am

For Retinanet, it is expected to use the same format between validation_data_sources and data_sources.

system · July 13, 2022, 12:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while training on coco dataset TAO Toolkit	10	642	May 24, 2022
Very slow initialization of training and first epoch TAO Toolkit	11	3039	September 30, 2021
Error while training using the detectnet_v2 notebook provided in the TAO toolkit with using the custom dataset TAO Toolkit computer-vision-cv , tao	16	1389	January 13, 2023
Tao detectnet_v2 train failed with g_error_metadata.to_exception in autograph module TAO Toolkit tao	12	1393	January 10, 2022
DetectNet v2 training error - "ValueError: The zipfile extracted was corrupt. Please check your key " TAO Toolkit	2	1000	October 12, 2021
Unable to train \| tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC TAO Toolkit	8	2193	October 12, 2021
AttributeError: module 'logging' has no attribute 'getLoggger' TAO Toolkit tensorflow	3	1564	February 28, 2022
Yolo_v4 getting stuck while training TAO Toolkit	3	1037	October 9, 2021
DataLoader excepts at least on element in data_sources TAO Toolkit	10	476	October 28, 2022
Error training Faster RCNN model TAO Toolkit	17	1554	October 12, 2021

Retinanet box out of bounds

Related topics