Retinanet box out of bounds

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) : AWS g4dn instance
• Network Type: retinanet
• TLT Version: v3.21.11
• Training spec file(If have, please share here)

random_seed: 42
retinanet_config {
  aspect_ratios_global: "[1.0, 2.0, 0.5]"
  scales: "[0.045, 0.09, 0.2, 0.4, 0.55, 0.7]"
  two_boxes_for_ar1: false
  clip_boxes: false
  loss_loc_weight: 0.8
  focal_loss_alpha: 0.25
  focal_loss_gamma: 2.0
  variances: "[0.1, 0.1, 0.2, 0.2]"
  arch: "resnet"
  nlayers: 18
  n_kernels: 1
  n_anchor_levels: 1
  feature_size: 256
  freeze_bn: False
  freeze_blocks: 0
}
training_config {
  enable_qat: False
  pretrain_model_path: "/workspace/tao-experiments/retinanet/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5"
  batch_size_per_gpu: 8
  num_epochs: 100
  n_workers: 2
  checkpoint_interval: 10
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 4e-5
      max_learning_rate: 1.5e-2
      soft_start: 0.1
      annealing: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 2e-5
  }
  optimizer {
    sgd {
      momentum: 0.9
      nesterov: True
    }
  }
}
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  batch_size: 8
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.01
  clustering_iou_threshold: 0.6
  top_k: 200
}
augmentation_config {
    output_width: 1248
    output_height: 384
    output_channel: 3
}
dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_train*"
  }
  target_class_mapping {
      key: "car"
      value: "vehicle"
  }
  target_class_mapping {
      key: "truck"
      value: "vehicle"
  }
  target_class_mapping {
      key: "van"
      value: "car"
  } 
  target_class_mapping {
      key: "bus"
      value: "vehicle"
  }
  target_class_mapping {
      key: "person"
      value: "person"
  }
   validation_data_sources: {
    image_directory_path: "/workspace/tao-experiments/data/val/image"
    label_directory_path: "/workspace/tao-experiments/data/val/label"
  } 
}

I ran into an issue while using the TAO toolkit to do transfer learning with custom dataset. The error shows the box is out of bounds. The model I used is retinanet, and the dataset is in KITTI format, but there was no issue if I use yolov3 model.
The log is as follows:

2022-06-27 17:34:51,453 [INFO] root: Registry: ['nvcr.io']
2022-06-27 17:34:51,523 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-06-27 17:34:51,532 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:61: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2022-06-27 17:34:58,037 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:61: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:64: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-06-27 17:34:58,038 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py:64: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-06-27 17:34:58,527 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/retinanet/specs/retinanet_train_resnet18_kitti.txt.
2022-06-27 17:34:58,529 [INFO] iva.retinanet.utils.spec_loader: Merging specification from /workspace/tao-experiments/retinanet/specs/retinanet_train_resnet18_kitti.txt
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2022-06-27 17:34:58,531 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2022-06-27 17:34:58,532 [INFO] __main__: Using DALI dataloader...
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2022-06-27 17:34:58,675 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2022-06-27 17:34:58,691 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.

2022-06-27 17:34:59,159 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

2022-06-27 17:34:59,183 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

2022-06-27 17:34:59,370 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2022-06-27 17:34:59,548 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

2022-06-27 17:35:00,912 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2022-06-27 17:35:01,104 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2022-06-27 17:35:01,104 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2022-06-27 17:35:01,105 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2022-06-27 17:35:01,624 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

2022-06-27 17:35:02,293 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

2022-06-27 17:35:02,297 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

2022-06-27 17:35:02,955 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2022-06-27 17:35:03,117 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2022-06-27 17:35:04,489 [INFO] iva.retinanet.utils.model_io: Loading model weights...
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Input (InputLayer)              (8, 3, 384, 1248)    0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (8, 64, 192, 624)    9408        Input[0][0]                      
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (8, 64, 192, 624)    256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (8, 64, 192, 624)    0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (8, 64, 96, 312)     36864       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (8, 64, 96, 312)     256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (8, 64, 96, 312)     0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (8, 64, 96, 312)     36864       block_1a_relu_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (8, 64, 96, 312)     4096        activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (8, 64, 96, 312)     256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (8, 64, 96, 312)     256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (8, 64, 96, 312)     0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1a_relu (Activation)      (8, 64, 96, 312)     0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (8, 64, 96, 312)     36864       block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (8, 64, 96, 312)     256         block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (8, 64, 96, 312)     0           block_1b_bn_1[0][0]              
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (8, 64, 96, 312)     36864       block_1b_relu_1[0][0]            
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (8, 64, 96, 312)     4096        block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (8, 64, 96, 312)     256         block_1b_conv_2[0][0]            
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (8, 64, 96, 312)     256         block_1b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_2 (Add)                     (8, 64, 96, 312)     0           block_1b_bn_2[0][0]              
                                                                 block_1b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1b_relu (Activation)      (8, 64, 96, 312)     0           add_2[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (8, 128, 48, 156)    73728       block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (8, 128, 48, 156)    512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (8, 128, 48, 156)    0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (8, 128, 48, 156)    147456      block_2a_relu_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (8, 128, 48, 156)    8192        block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (8, 128, 48, 156)    512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (8, 128, 48, 156)    512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_3 (Add)                     (8, 128, 48, 156)    0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2a_relu (Activation)      (8, 128, 48, 156)    0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (8, 128, 48, 156)    147456      block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (8, 128, 48, 156)    512         block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (8, 128, 48, 156)    0           block_2b_bn_1[0][0]              
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (8, 128, 48, 156)    147456      block_2b_relu_1[0][0]            
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (8, 128, 48, 156)    16384       block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (8, 128, 48, 156)    512         block_2b_conv_2[0][0]            
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (8, 128, 48, 156)    512         block_2b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_4 (Add)                     (8, 128, 48, 156)    0           block_2b_bn_2[0][0]              
                                                                 block_2b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2b_relu (Activation)      (8, 128, 48, 156)    0           add_4[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (8, 256, 24, 78)     294912      block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (8, 256, 24, 78)     1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (8, 256, 24, 78)     0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (8, 256, 24, 78)     589824      block_3a_relu_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (8, 256, 24, 78)     32768       block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (8, 256, 24, 78)     1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (8, 256, 24, 78)     1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_5 (Add)                     (8, 256, 24, 78)     0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3a_relu (Activation)      (8, 256, 24, 78)     0           add_5[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (8, 256, 24, 78)     589824      block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (8, 256, 24, 78)     1024        block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (8, 256, 24, 78)     0           block_3b_bn_1[0][0]              
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (8, 256, 24, 78)     589824      block_3b_relu_1[0][0]            
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (8, 256, 24, 78)     65536       block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (8, 256, 24, 78)     1024        block_3b_conv_2[0][0]            
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (8, 256, 24, 78)     1024        block_3b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_6 (Add)                     (8, 256, 24, 78)     0           block_3b_bn_2[0][0]              
                                                                 block_3b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3b_relu (Activation)      (8, 256, 24, 78)     0           add_6[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (8, 512, 24, 78)     1179648     block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (8, 512, 24, 78)     2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (8, 512, 24, 78)     0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (8, 512, 24, 78)     2359296     block_4a_relu_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (8, 512, 24, 78)     131072      block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (8, 512, 24, 78)     2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (8, 512, 24, 78)     2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_7 (Add)                     (8, 512, 24, 78)     0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4a_relu (Activation)      (8, 512, 24, 78)     0           add_7[0][0]                      
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (8, 512, 24, 78)     2359296     block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (8, 512, 24, 78)     2048        block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (8, 512, 24, 78)     0           block_4b_bn_1[0][0]              
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (8, 512, 24, 78)     2359296     block_4b_relu_1[0][0]            
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (8, 512, 24, 78)     262144      block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (8, 512, 24, 78)     2048        block_4b_conv_2[0][0]            
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (8, 512, 24, 78)     2048        block_4b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_8 (Add)                     (8, 512, 24, 78)     0           block_4b_bn_2[0][0]              
                                                                 block_4b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4b_relu (Activation)      (8, 512, 24, 78)     0           add_8[0][0]                      
__________________________________________________________________________________________________
expand_conv1 (Conv2D)           (8, 256, 12, 39)     1179904     block_4b_relu[0][0]              
__________________________________________________________________________________________________
expand1_relu (ReLU)             (8, 256, 12, 39)     0           expand_conv1[0][0]               
__________________________________________________________________________________________________
C5_reduced (Conv2D)             (8, 256, 12, 39)     65792       expand1_relu[0][0]               
__________________________________________________________________________________________________
P5_upsampled (UpSampling2D)     (8, 256, 24, 78)     0           C5_reduced[0][0]                 
__________________________________________________________________________________________________
C4_reduced (Conv2D)             (8, 256, 24, 78)     131328      block_4b_relu[0][0]              
__________________________________________________________________________________________________
P4_merged (Add)                 (8, 256, 24, 78)     0           P5_upsampled[0][0]               
                                                                 C4_reduced[0][0]                 
__________________________________________________________________________________________________
P4_upsampled (UpSampling2D)     (8, 256, 48, 156)    0           P4_merged[0][0]                  
__________________________________________________________________________________________________
C3_reduced (Conv2D)             (8, 256, 48, 156)    33024       block_2b_relu[0][0]              
__________________________________________________________________________________________________
P6 (Conv2D)                     (8, 256, 6, 20)      590080      expand1_relu[0][0]               
__________________________________________________________________________________________________
P3_merged (Add)                 (8, 256, 48, 156)    0           P4_upsampled[0][0]               
                                                                 C3_reduced[0][0]                 
__________________________________________________________________________________________________
P6_relu (ReLU)                  (8, 256, 6, 20)      0           P6[0][0]                         
__________________________________________________________________________________________________
P3 (Conv2D)                     (8, 256, 48, 156)    590080      P3_merged[0][0]                  
__________________________________________________________________________________________________
P4 (Conv2D)                     (8, 256, 24, 78)     590080      P4_merged[0][0]                  
__________________________________________________________________________________________________
P5 (Conv2D)                     (8, 256, 12, 39)     590080      C5_reduced[0][0]                 
__________________________________________________________________________________________________
P7 (Conv2D)                     (8, 256, 3, 10)      590080      P6_relu[0][0]                    
__________________________________________________________________________________________________
P3_relu (ReLU)                  (8, 256, 48, 156)    0           P3[0][0]                         
__________________________________________________________________________________________________
P4_relu (ReLU)                  (8, 256, 24, 78)     0           P4[0][0]                         
__________________________________________________________________________________________________
P5_relu (ReLU)                  (8, 256, 12, 39)     0           P5[0][0]                         
__________________________________________________________________________________________________
P7_relu (ReLU)                  (8, 256, 3, 10)      0           P7[0][0]                         
__________________________________________________________________________________________________
retinanet_class_subn_0 (Conv2D) multiple             590080      P3_relu[0][0]                    
                                                                 P4_relu[0][0]                    
                                                                 P5_relu[0][0]                    
                                                                 P6_relu[0][0]                    
                                                                 P7_relu[0][0]                    
__________________________________________________________________________________________________
retinanet_conf_regressor (Conv2 multiple             27660       retinanet_class_subn_0[0][0]     
                                                                 retinanet_class_subn_0[1][0]     
                                                                 retinanet_class_subn_0[2][0]     
                                                                 retinanet_class_subn_0[3][0]     
                                                                 retinanet_class_subn_0[4][0]     
__________________________________________________________________________________________________
retinanet_loc_subn_0 (Conv2D)   multiple             590080      P3_relu[0][0]                    
                                                                 P4_relu[0][0]                    
                                                                 P5_relu[0][0]                    
                                                                 P6_relu[0][0]                    
                                                                 P7_relu[0][0]                    
__________________________________________________________________________________________________
permute_1 (Permute)             (8, 48, 156, 12)     0           retinanet_conf_regressor[0][0]   
__________________________________________________________________________________________________
permute_3 (Permute)             (8, 24, 78, 12)      0           retinanet_conf_regressor[1][0]   
__________________________________________________________________________________________________
permute_5 (Permute)             (8, 12, 39, 12)      0           retinanet_conf_regressor[2][0]   
__________________________________________________________________________________________________
permute_7 (Permute)             (8, 6, 20, 12)       0           retinanet_conf_regressor[3][0]   
__________________________________________________________________________________________________
permute_9 (Permute)             (8, 3, 10, 12)       0           retinanet_conf_regressor[4][0]   
__________________________________________________________________________________________________
retinanet_loc_regressor (Conv2D multiple             27660       retinanet_loc_subn_0[0][0]       
                                                                 retinanet_loc_subn_0[1][0]       
                                                                 retinanet_loc_subn_0[2][0]       
                                                                 retinanet_loc_subn_0[3][0]       
                                                                 retinanet_loc_subn_0[4][0]       
__________________________________________________________________________________________________
conf_reshape_0 (Reshape)        (8, 22464, 1, 4)     0           permute_1[0][0]                  
__________________________________________________________________________________________________
conf_reshape_1 (Reshape)        (8, 5616, 1, 4)      0           permute_3[0][0]                  
__________________________________________________________________________________________________
conf_reshape_2 (Reshape)        (8, 1404, 1, 4)      0           permute_5[0][0]                  
__________________________________________________________________________________________________
conf_reshape_3 (Reshape)        (8, 360, 1, 4)       0           permute_7[0][0]                  
__________________________________________________________________________________________________
conf_reshape_4 (Reshape)        (8, 90, 1, 4)        0           permute_9[0][0]                  
__________________________________________________________________________________________________
permute_2 (Permute)             (8, 48, 156, 12)     0           retinanet_loc_regressor[0][0]    
__________________________________________________________________________________________________
permute_4 (Permute)             (8, 24, 78, 12)      0           retinanet_loc_regressor[1][0]    
__________________________________________________________________________________________________
permute_6 (Permute)             (8, 12, 39, 12)      0           retinanet_loc_regressor[2][0]    
__________________________________________________________________________________________________
permute_8 (Permute)             (8, 6, 20, 12)       0           retinanet_loc_regressor[3][0]    
__________________________________________________________________________________________________
permute_10 (Permute)            (8, 3, 10, 12)       0           retinanet_loc_regressor[4][0]    
__________________________________________________________________________________________________
retinanet_anchor_0 (RetinaAncho (8, 7488, 3, 8)      0           retinanet_loc_regressor[0][0]    
__________________________________________________________________________________________________
retinanet_anchor_1 (RetinaAncho (8, 1872, 3, 8)      0           retinanet_loc_regressor[1][0]    
__________________________________________________________________________________________________
retinanet_anchor_2 (RetinaAncho (8, 468, 3, 8)       0           retinanet_loc_regressor[2][0]    
__________________________________________________________________________________________________
retinanet_anchor_3 (RetinaAncho (8, 120, 3, 8)       0           retinanet_loc_regressor[3][0]    
__________________________________________________________________________________________________
retinanet_anchor_4 (RetinaAncho (8, 30, 3, 8)        0           retinanet_loc_regressor[4][0]    
__________________________________________________________________________________________________
mbox_conf (Concatenate)         (8, 29934, 1, 4)     0           conf_reshape_0[0][0]             
                                                                 conf_reshape_1[0][0]             
                                                                 conf_reshape_2[0][0]             
                                                                 conf_reshape_3[0][0]             
                                                                 conf_reshape_4[0][0]             
__________________________________________________________________________________________________
loc_reshape_0 (Reshape)         (8, 22464, 1, 4)     0           permute_2[0][0]                  
__________________________________________________________________________________________________
loc_reshape_1 (Reshape)         (8, 5616, 1, 4)      0           permute_4[0][0]                  
__________________________________________________________________________________________________
loc_reshape_2 (Reshape)         (8, 1404, 1, 4)      0           permute_6[0][0]                  
__________________________________________________________________________________________________
loc_reshape_3 (Reshape)         (8, 360, 1, 4)       0           permute_8[0][0]                  
__________________________________________________________________________________________________
loc_reshape_4 (Reshape)         (8, 90, 1, 4)        0           permute_10[0][0]                 
__________________________________________________________________________________________________
anchor_reshape_0 (Reshape)      (8, 22464, 1, 8)     0           retinanet_anchor_0[0][0]         
__________________________________________________________________________________________________
anchor_reshape_1 (Reshape)      (8, 5616, 1, 8)      0           retinanet_anchor_1[0][0]         
__________________________________________________________________________________________________
anchor_reshape_2 (Reshape)      (8, 1404, 1, 8)      0           retinanet_anchor_2[0][0]         
__________________________________________________________________________________________________
anchor_reshape_3 (Reshape)      (8, 360, 1, 8)       0           retinanet_anchor_3[0][0]         
__________________________________________________________________________________________________
anchor_reshape_4 (Reshape)      (8, 90, 1, 8)        0           retinanet_anchor_4[0][0]         
__________________________________________________________________________________________________
mbox_conf_sigmoid (Activation)  (8, 29934, 1, 4)     0           mbox_conf[0][0]                  
__________________________________________________________________________________________________
mbox_loc (Concatenate)          (8, 29934, 1, 4)     0           loc_reshape_0[0][0]              
                                                                 loc_reshape_1[0][0]              
                                                                 loc_reshape_2[0][0]              
                                                                 loc_reshape_3[0][0]              
                                                                 loc_reshape_4[0][0]              
__________________________________________________________________________________________________
mbox_priorbox (Concatenate)     (8, 29934, 1, 8)     0           anchor_reshape_0[0][0]           
                                                                 anchor_reshape_1[0][0]           
                                                                 anchor_reshape_2[0][0]           
                                                                 anchor_reshape_3[0][0]           
                                                                 anchor_reshape_4[0][0]           
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (8, 29934, 1, 16)    0           mbox_conf_sigmoid[0][0]          
                                                                 mbox_loc[0][0]                   
                                                                 mbox_priorbox[0][0]              
__________________________________________________________________________________________________
retinanet_predictions (Reshape) (8, 29934, 16)       0           concatenate_1[0][0]              
==================================================================================================
Total params: 17,138,392
Trainable params: 17,117,336
Non-trainable params: 21,056
__________________________________________________________________________________________________
2022-06-27 17:35:27,124 [INFO] __main__: Number of samples in the training dataset:	  1732
2022-06-27 17:35:27,124 [INFO] __main__: Number of samples in the validation dataset:	   192
Epoch 1/100
DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]

Current pipeline object is no longer valid.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 390, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 528, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 516, in return_func
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 386, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/train.py", line 308, in run_experiment
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 154, in fit_loop
    outs = f(ins)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]

Current pipeline object is no longer valid.
	 [[{{node Dali}}]]
	 [[cond_6/MultiMatch/ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_int32_Reshape_1/_3675]]
  (1) Internal: DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:165] Assert on "limits.contains(boxes[i])" failed: box {(-0.0046875, 0.584259), (0.0510417, 0.841667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x413ace) [0x7f4c7e27bace]
[frame 1]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x4ccc6b) [0x7f4c7e334c6b]
[frame 2]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali_operators.so(+0x11b7b15) [0x7f4c7f01fb15]
[frame 3]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x217) [0x7f4c7d1075a7]
[frame 4]: /usr/local/lib/python3.6/dist-packages/nvidia/dali/libdali.so(+0x8a213f) [0x7f4c7d84113f]
[frame 5]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4d4a2796db]
[frame 6]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4d4a5b271f]

Current pipeline object is no longer valid.
	 [[{{node Dali}}]]
0 successful operations.
0 derived errors ignored.
2022-06-27 17:35:37,475 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Could you please try sequence data format ?
i.e.,
image_directory_path: “/workspace/tao-experiments/data/train/image”
label_directory_path: “/workspace/tao-experiments/data/train/label”

It is working now. Thank you so much.
Why using the path to the tfrecords as the data source causes a problem?

For Retinanet, it is expected to use the same format between validation_data_sources and data_sources.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.