SSD - SqueezeNet: Invalid loss, terminating training

Hello, I’m trying to train the SSD model, first I tried with ResNet10 as backbone the training completed successfully.
But, when I switched to SqueezeNet it failed.

Here is the tf-record conversion log:

Using TensorFlow backend.
2020-07-10 11:10:37,358 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2020-07-10 11:10:37,485 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in
Train: 18592	Val: 3280
2020-07-10 11:10:37,485 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2020-07-10 11:10:37,497 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 0
/usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:266: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
2020-07-10 11:10:38,272 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1
2020-07-10 11:10:39,053 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 2
2020-07-10 11:10:40,115 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 3
2020-07-10 11:10:40,973 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 4
2020-07-10 11:10:41,783 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 5
2020-07-10 11:10:42,606 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 6
2020-07-10 11:10:43,381 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 7
2020-07-10 11:10:44,223 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 8
2020-07-10 11:10:45,091 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 9
2020-07-10 11:10:45,875 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - 
Wrote the following numbers of objects:
person: 27355
face: 23013

2020-07-10 11:10:45,875 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 0
2020-07-10 11:10:50,719 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 1
2020-07-10 11:10:55,257 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 2
2020-07-10 11:11:00,096 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 3
2020-07-10 11:11:05,161 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 4
2020-07-10 11:11:09,719 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 5
2020-07-10 11:11:15,032 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 6
2020-07-10 11:11:19,805 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 7
2020-07-10 11:11:24,546 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 8
2020-07-10 11:11:29,399 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 9
2020-07-10 11:11:34,341 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - 
Wrote the following numbers of objects:
person: 152897
face: 136411

2020-07-10 11:11:34,342 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Cumulative object statistics
2020-07-10 11:11:34,342 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - 
Wrote the following numbers of objects:
person: 180252
face: 159424

2020-07-10 11:11:34,342 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map. 
Label in GT: Label in tfrecords file 
person: person
face: face
2020-07-10 11:11:34,342 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Tfrecords generation complete.
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

Training spec file


random_seed: 42
ssd_config {
aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 1.0/3.0]"
scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
two_boxes_for_ar1: true
clip_boxes: false
loss_loc_weight: 0.8
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: "[0.1, 0.1, 0.2, 0.2]"
arch: "squeezenet"

freeze_bn: false
}
training_config {
batch_size_per_gpu: 32
num_epochs: 30
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-5
max_learning_rate: 2e-2
soft_start: 0.15
annealing: 0.5
}
}
regularizer {
type: L1
weight: 3e-06
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
batch_size: 32
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
clustering_iou_threshold: 0.6
top_k: 200
}
augmentation_config {
preprocessing {
output_image_width: 320
output_image_height: 320
output_image_channel: 3
crop_right: 320
crop_bottom: 320
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 0.7
zoom_max: 1.8
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
dataset_config {
data_sources: {
tfrecords_path: "/dataset_processed/tfrecords/*"
image_directory_path: "/dataset_processed"
}
image_extension: "jpg"

target_class_mapping {
key: "face"
value: "face"
}

target_class_mapping {
key: "person"
value: "person"
}

validation_fold: 0
}


And tlt-train console output

Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
2020-07-10 11:12:13,957 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /project/specs/train.txt
2020-07-10 11:12:13,979 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /project/specs/train.txt
2020-07-10 11:12:13,980 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /project/specs/train.txt
2020-07-10 11:12:14,020 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /project/specs/train.txt
2020-07-10 11:12:14,042 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /project/specs/train.txt
2020-07-10 11:12:14,044 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /project/specs/train.txt
2020-07-10 11:12:14,055 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /project/specs/train.txt
2020-07-10 11:12:14,056 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /project/specs/train.txt
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
.
.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-07-10 11:13:16,291 [INFO] iva.ssd.scripts.train: Loading pretrained weights. This may take a while...
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-07-10 11:13:17,284 [INFO] iva.ssd.scripts.train: Loading pretrained weights. This may take a while...
.
.
2020-07-10 11:13:18,823 [INFO] iva.ssd.scripts.train: Loading pretrained weights. This may take a while...
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Input (InputLayer)              (32, 3, 320, 320)    0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (32, 96, 160, 160)   14208       Input[0][0]                      
__________________________________________________________________________________________________
conv1_relu (Activation)         (32, 96, 160, 160)   0           conv1[0][0]                      
__________________________________________________________________________________________________
pool1 (MaxPooling2D)            (32, 96, 80, 80)     0           conv1_relu[0][0]                 
__________________________________________________________________________________________________
fire2_squeeze_conv (Conv2D)     (32, 16, 80, 80)     1552        pool1[0][0]                      
__________________________________________________________________________________________________
fire2_squeeze (Activation)      (32, 16, 80, 80)     0           fire2_squeeze_conv[0][0]         
__________________________________________________________________________________________________
fire2_expand_conv1x1 (Conv2D)   (32, 64, 80, 80)     1088        fire2_squeeze[0][0]              
__________________________________________________________________________________________________
fire2_expand_conv3x3 (Conv2D)   (32, 64, 80, 80)     9280        fire2_squeeze[0][0]              
__________________________________________________________________________________________________
fire2_expand_1x1 (Activation)   (32, 64, 80, 80)     0           fire2_expand_conv1x1[0][0]       
__________________________________________________________________________________________________
fire2_expand_3x3 (Activation)   (32, 64, 80, 80)     0           fire2_expand_conv3x3[0][0]       
__________________________________________________________________________________________________
fire2 (Concatenate)             (32, 128, 80, 80)    0           fire2_expand_1x1[0][0]           
                                                                 fire2_expand_3x3[0][0]           
__________________________________________________________________________________________________
fire3_squeeze_conv (Conv2D)     (32, 16, 80, 80)     2064        fire2[0][0]                      
__________________________________________________________________________________________________
fire3_squeeze (Activation)      (32, 16, 80, 80)     0           fire3_squeeze_conv[0][0]         
__________________________________________________________________________________________________
fire3_expand_conv1x1 (Conv2D)   (32, 64, 80, 80)     1088        fire3_squeeze[0][0]              
__________________________________________________________________________________________________
fire3_expand_conv3x3 (Conv2D)   (32, 64, 80, 80)     9280        fire3_squeeze[0][0]              
__________________________________________________________________________________________________
fire3_expand_1x1 (Activation)   (32, 64, 80, 80)     0           fire3_expand_conv1x1[0][0]       
__________________________________________________________________________________________________
fire3_expand_3x3 (Activation)   (32, 64, 80, 80)     0           fire3_expand_conv3x3[0][0]       
__________________________________________________________________________________________________
fire3 (Concatenate)             (32, 128, 80, 80)    0           fire3_expand_1x1[0][0]           
                                                                 fire3_expand_3x3[0][0]           
__________________________________________________________________________________________________
fire4_squeeze_conv (Conv2D)     (32, 32, 80, 80)     4128        fire3[0][0]                      
__________________________________________________________________________________________________
fire4_squeeze (Activation)      (32, 32, 80, 80)     0           fire4_squeeze_conv[0][0]         
__________________________________________________________________________________________________
fire4_expand_conv1x1 (Conv2D)   (32, 128, 80, 80)    4224        fire4_squeeze[0][0]              
__________________________________________________________________________________________________
fire4_expand_conv3x3 (Conv2D)   (32, 128, 80, 80)    36992       fire4_squeeze[0][0]              
__________________________________________________________________________________________________
fire4_expand_1x1 (Activation)   (32, 128, 80, 80)    0           fire4_expand_conv1x1[0][0]       
__________________________________________________________________________________________________
fire4_expand_3x3 (Activation)   (32, 128, 80, 80)    0           fire4_expand_conv3x3[0][0]       
__________________________________________________________________________________________________
fire4 (Concatenate)             (32, 256, 80, 80)    0           fire4_expand_1x1[0][0]           
                                                                 fire4_expand_3x3[0][0]           
__________________________________________________________________________________________________
pool4 (MaxPooling2D)            (32, 256, 40, 40)    0           fire4[0][0]                      
__________________________________________________________________________________________________
fire5_squeeze_conv (Conv2D)     (32, 32, 40, 40)     8224        pool4[0][0]                      
__________________________________________________________________________________________________
fire5_squeeze (Activation)      (32, 32, 40, 40)     0           fire5_squeeze_conv[0][0]         
__________________________________________________________________________________________________
fire5_expand_conv1x1 (Conv2D)   (32, 128, 40, 40)    4224        fire5_squeeze[0][0]              
__________________________________________________________________________________________________
fire5_expand_conv3x3 (Conv2D)   (32, 128, 40, 40)    36992       fire5_squeeze[0][0]              
__________________________________________________________________________________________________
fire5_expand_1x1 (Activation)   (32, 128, 40, 40)    0           fire5_expand_conv1x1[0][0]       
__________________________________________________________________________________________________
fire5_expand_3x3 (Activation)   (32, 128, 40, 40)    0           fire5_expand_conv3x3[0][0]       
__________________________________________________________________________________________________
fire5 (Concatenate)             (32, 256, 40, 40)    0           fire5_expand_1x1[0][0]           
                                                                 fire5_expand_3x3[0][0]           
__________________________________________________________________________________________________
fire6_squeeze_conv (Conv2D)     (32, 48, 40, 40)     12336       fire5[0][0]                      
__________________________________________________________________________________________________
fire6_squeeze (Activation)      (32, 48, 40, 40)     0           fire6_squeeze_conv[0][0]         
__________________________________________________________________________________________________
fire6_expand_conv1x1 (Conv2D)   (32, 192, 40, 40)    9408        fire6_squeeze[0][0]              
__________________________________________________________________________________________________
fire6_expand_conv3x3 (Conv2D)   (32, 192, 40, 40)    83136       fire6_squeeze[0][0]              
__________________________________________________________________________________________________
fire6_expand_1x1 (Activation)   (32, 192, 40, 40)    0           fire6_expand_conv1x1[0][0]       
__________________________________________________________________________________________________
fire6_expand_3x3 (Activation)   (32, 192, 40, 40)    0           fire6_expand_conv3x3[0][0]       
__________________________________________________________________________________________________
fire6 (Concatenate)             (32, 384, 40, 40)    0           fire6_expand_1x1[0][0]           
                                                                 fire6_expand_3x3[0][0]           
__________________________________________________________________________________________________
fire7_squeeze_conv (Conv2D)     (32, 48, 40, 40)     18480       fire6[0][0]                      
__________________________________________________________________________________________________
fire7_squeeze (Activation)      (32, 48, 40, 40)     0           fire7_squeeze_conv[0][0]         
__________________________________________________________________________________________________
fire7_expand_conv1x1 (Conv2D)   (32, 192, 40, 40)    9408        fire7_squeeze[0][0]              
__________________________________________________________________________________________________
fire7_expand_conv3x3 (Conv2D)   (32, 192, 40, 40)    83136       fire7_squeeze[0][0]              
__________________________________________________________________________________________________
fire7_expand_1x1 (Activation)   (32, 192, 40, 40)    0           fire7_expand_conv1x1[0][0]       
__________________________________________________________________________________________________
fire7_expand_3x3 (Activation)   (32, 192, 40, 40)    0           fire7_expand_conv3x3[0][0]       
__________________________________________________________________________________________________
fire7 (Concatenate)             (32, 384, 40, 40)    0           fire7_expand_1x1[0][0]           
                                                                 fire7_expand_3x3[0][0]           
__________________________________________________________________________________________________
fire8_squeeze_conv (Conv2D)     (32, 64, 40, 40)     24640       fire7[0][0]                      
__________________________________________________________________________________________________
fire8_squeeze (Activation)      (32, 64, 40, 40)     0           fire8_squeeze_conv[0][0]         
__________________________________________________________________________________________________
fire8_expand_conv1x1 (Conv2D)   (32, 256, 40, 40)    16640       fire8_squeeze[0][0]              
__________________________________________________________________________________________________
fire8_expand_conv3x3 (Conv2D)   (32, 256, 40, 40)    147712      fire8_squeeze[0][0]              
__________________________________________________________________________________________________
fire8_expand_1x1 (Activation)   (32, 256, 40, 40)    0           fire8_expand_conv1x1[0][0]       
__________________________________________________________________________________________________
fire8_expand_3x3 (Activation)   (32, 256, 40, 40)    0           fire8_expand_conv3x3[0][0]       
__________________________________________________________________________________________________
fire8 (Concatenate)             (32, 512, 40, 40)    0           fire8_expand_1x1[0][0]           
                                                                 fire8_expand_3x3[0][0]           
__________________________________________________________________________________________________
pool8 (MaxPooling2D)            (32, 512, 20, 20)    0           fire8[0][0]                      
__________________________________________________________________________________________________
fire9_squeeze_conv (Conv2D)     (32, 64, 20, 20)     32832       pool8[0][0]                      
__________________________________________________________________________________________________
fire9_squeeze (Activation)      (32, 64, 20, 20)     0           fire9_squeeze_conv[0][0]         
__________________________________________________________________________________________________
fire9_expand_conv1x1 (Conv2D)   (32, 256, 20, 20)    16640       fire9_squeeze[0][0]              
__________________________________________________________________________________________________
fire9_expand_conv3x3 (Conv2D)   (32, 256, 20, 20)    147712      fire9_squeeze[0][0]              
__________________________________________________________________________________________________
fire9_expand_1x1 (Activation)   (32, 256, 20, 20)    0           fire9_expand_conv1x1[0][0]       
__________________________________________________________________________________________________
fire9_expand_3x3 (Activation)   (32, 256, 20, 20)    0           fire9_expand_conv3x3[0][0]       
__________________________________________________________________________________________________
fire9 (Concatenate)             (32, 512, 20, 20)    0           fire9_expand_1x1[0][0]           
                                                                 fire9_expand_3x3[0][0]           
__________________________________________________________________________________________________
ssd_expand_block_1_conv_0 (Conv (32, 64, 20, 20)     32832       fire9[0][0]                      
__________________________________________________________________________________________________
ssd_expand_block_1_relu_0 (ReLU (32, 64, 20, 20)     0           ssd_expand_block_1_conv_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_1_conv_1 (Conv (32, 128, 10, 10)    73728       ssd_expand_block_1_relu_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_1_bn_1 (BatchN (32, 128, 10, 10)    512         ssd_expand_block_1_conv_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_1_relu_1 (ReLU (32, 128, 10, 10)    0           ssd_expand_block_1_bn_1[0][0]    
__________________________________________________________________________________________________
ssd_expand_block_2_conv_0 (Conv (32, 64, 10, 10)     8256        ssd_expand_block_1_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_2_relu_0 (ReLU (32, 64, 10, 10)     0           ssd_expand_block_2_conv_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_2_conv_1 (Conv (32, 128, 5, 5)      73728       ssd_expand_block_2_relu_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_2_bn_1 (BatchN (32, 128, 5, 5)      512         ssd_expand_block_2_conv_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_2_relu_1 (ReLU (32, 128, 5, 5)      0           ssd_expand_block_2_bn_1[0][0]    
__________________________________________________________________________________________________
ssd_expand_block_3_conv_0 (Conv (32, 64, 5, 5)       8256        ssd_expand_block_2_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_3_relu_0 (ReLU (32, 64, 5, 5)       0           ssd_expand_block_3_conv_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_3_conv_1 (Conv (32, 128, 3, 3)      73728       ssd_expand_block_3_relu_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_3_bn_1 (BatchN (32, 128, 3, 3)      512         ssd_expand_block_3_conv_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_3_relu_1 (ReLU (32, 128, 3, 3)      0           ssd_expand_block_3_bn_1[0][0]    
__________________________________________________________________________________________________
ssd_expand_block_4_conv_0 (Conv (32, 64, 3, 3)       8256        ssd_expand_block_3_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_4_relu_0 (ReLU (32, 64, 3, 3)       0           ssd_expand_block_4_conv_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_4_conv_1 (Conv (32, 128, 2, 2)      73728       ssd_expand_block_4_relu_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_4_bn_1 (BatchN (32, 128, 2, 2)      512         ssd_expand_block_4_conv_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_4_relu_1 (ReLU (32, 128, 2, 2)      0           ssd_expand_block_4_bn_1[0][0]    
__________________________________________________________________________________________________
ssd_conf_0 (Conv2D)             (32, 12, 40, 40)     55308       fire8[0][0]                      
__________________________________________________________________________________________________
ssd_conf_1 (Conv2D)             (32, 12, 20, 20)     55308       fire9[0][0]                      
__________________________________________________________________________________________________
ssd_conf_2 (Conv2D)             (32, 12, 10, 10)     13836       ssd_expand_block_1_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_conf_3 (Conv2D)             (32, 12, 5, 5)       13836       ssd_expand_block_2_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_conf_4 (Conv2D)             (32, 12, 3, 3)       13836       ssd_expand_block_3_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_conf_5 (Conv2D)             (32, 12, 2, 2)       13836       ssd_expand_block_4_relu_1[0][0]  
__________________________________________________________________________________________________
permute_13 (Permute)            (32, 40, 40, 12)     0           ssd_conf_0[0][0]                 
__________________________________________________________________________________________________
permute_15 (Permute)            (32, 20, 20, 12)     0           ssd_conf_1[0][0]                 
__________________________________________________________________________________________________
permute_17 (Permute)            (32, 10, 10, 12)     0           ssd_conf_2[0][0]                 
__________________________________________________________________________________________________
permute_19 (Permute)            (32, 5, 5, 12)       0           ssd_conf_3[0][0]                 
__________________________________________________________________________________________________
permute_21 (Permute)            (32, 3, 3, 12)       0           ssd_conf_4[0][0]                 
__________________________________________________________________________________________________
permute_23 (Permute)            (32, 2, 2, 12)       0           ssd_conf_5[0][0]                 
__________________________________________________________________________________________________
ssd_loc_0 (Conv2D)              (32, 24, 40, 40)     110616      fire8[0][0]                      
__________________________________________________________________________________________________
ssd_loc_1 (Conv2D)              (32, 24, 20, 20)     110616      fire9[0][0]                      
__________________________________________________________________________________________________
ssd_loc_2 (Conv2D)              (32, 24, 10, 10)     27672       ssd_expand_block_1_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_loc_3 (Conv2D)              (32, 24, 5, 5)       27672       ssd_expand_block_2_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_loc_4 (Conv2D)              (32, 24, 3, 3)       27672       ssd_expand_block_3_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_loc_5 (Conv2D)              (32, 24, 2, 2)       27672       ssd_expand_block_4_relu_1[0][0]  
__________________________________________________________________________________________________
conf_reshape_0 (Reshape)        (32, 9600, 1, 2)     0           permute_13[0][0]                 
__________________________________________________________________________________________________
conf_reshape_1 (Reshape)        (32, 2400, 1, 2)     0           permute_15[0][0]                 
__________________________________________________________________________________________________
conf_reshape_2 (Reshape)        (32, 600, 1, 2)      0           permute_17[0][0]                 
__________________________________________________________________________________________________
conf_reshape_3 (Reshape)        (32, 150, 1, 2)      0           permute_19[0][0]                 
__________________________________________________________________________________________________
conf_reshape_4 (Reshape)        (32, 54, 1, 2)       0           permute_21[0][0]                 
__________________________________________________________________________________________________
conf_reshape_5 (Reshape)        (32, 24, 1, 2)       0           permute_23[0][0]                 
__________________________________________________________________________________________________
permute_14 (Permute)            (32, 40, 40, 24)     0           ssd_loc_0[0][0]                  
__________________________________________________________________________________________________
permute_16 (Permute)            (32, 20, 20, 24)     0           ssd_loc_1[0][0]                  
__________________________________________________________________________________________________
permute_18 (Permute)            (32, 10, 10, 24)     0           ssd_loc_2[0][0]                  
__________________________________________________________________________________________________
permute_20 (Permute)            (32, 5, 5, 24)       0           ssd_loc_3[0][0]                  
__________________________________________________________________________________________________
permute_22 (Permute)            (32, 3, 3, 24)       0           ssd_loc_4[0][0]                  
__________________________________________________________________________________________________
permute_24 (Permute)            (32, 2, 2, 24)       0           ssd_loc_5[0][0]                  
__________________________________________________________________________________________________
ssd_anchor_0 (AnchorBoxes)      (32, 1600, 6, 8)     0           ssd_loc_0[0][0]                  
__________________________________________________________________________________________________
ssd_anchor_1 (AnchorBoxes)      (32, 400, 6, 8)      0           ssd_loc_1[0][0]                  
__________________________________________________________________________________________________
ssd_anchor_2 (AnchorBoxes)      (32, 100, 6, 8)      0           ssd_loc_2[0][0]                  
__________________________________________________________________________________________________
ssd_anchor_3 (AnchorBoxes)      (32, 25, 6, 8)       0           ssd_loc_3[0][0]                  
__________________________________________________________________________________________________
ssd_anchor_4 (AnchorBoxes)      (32, 9, 6, 8)        0           ssd_loc_4[0][0]                  
__________________________________________________________________________________________________
ssd_anchor_5 (AnchorBoxes)      (32, 4, 6, 8)        0           ssd_loc_5[0][0]                  
__________________________________________________________________________________________________
mbox_conf (Concatenate)         (32, 12828, 1, 2)    0           conf_reshape_0[0][0]             
                                                                 conf_reshape_1[0][0]             
                                                                 conf_reshape_2[0][0]             
                                                                 conf_reshape_3[0][0]             
                                                                 conf_reshape_4[0][0]             
                                                                 conf_reshape_5[0][0]             
__________________________________________________________________________________________________
loc_reshape_0 (Reshape)         (32, 9600, 1, 4)     0           permute_14[0][0]                 
__________________________________________________________________________________________________
loc_reshape_1 (Reshape)         (32, 2400, 1, 4)     0           permute_16[0][0]                 
__________________________________________________________________________________________________
loc_reshape_2 (Reshape)         (32, 600, 1, 4)      0           permute_18[0][0]                 
__________________________________________________________________________________________________
loc_reshape_3 (Reshape)         (32, 150, 1, 4)      0           permute_20[0][0]                 
__________________________________________________________________________________________________
loc_reshape_4 (Reshape)         (32, 54, 1, 4)       0           permute_22[0][0]                 
__________________________________________________________________________________________________
loc_reshape_5 (Reshape)         (32, 24, 1, 4)       0           permute_24[0][0]                 
__________________________________________________________________________________________________
anchor_reshape_0 (Reshape)      (32, 9600, 1, 8)     0           ssd_anchor_0[0][0]               
__________________________________________________________________________________________________
anchor_reshape_1 (Reshape)      (32, 2400, 1, 8)     0           ssd_anchor_1[0][0]               
__________________________________________________________________________________________________
anchor_reshape_2 (Reshape)      (32, 600, 1, 8)      0           ssd_anchor_2[0][0]               
__________________________________________________________________________________________________
anchor_reshape_3 (Reshape)      (32, 150, 1, 8)      0           ssd_anchor_3[0][0]               
__________________________________________________________________________________________________
anchor_reshape_4 (Reshape)      (32, 54, 1, 8)       0           ssd_anchor_4[0][0]               
__________________________________________________________________________________________________
anchor_reshape_5 (Reshape)      (32, 24, 1, 8)       0           ssd_anchor_5[0][0]               
__________________________________________________________________________________________________
mbox_conf_sigmoid (Activation)  (32, 12828, 1, 2)    0           mbox_conf[0][0]                  
__________________________________________________________________________________________________
mbox_loc (Concatenate)          (32, 12828, 1, 4)    0           loc_reshape_0[0][0]              
                                                                 loc_reshape_1[0][0]              
                                                                 loc_reshape_2[0][0]              
                                                                 loc_reshape_3[0][0]              
                                                                 loc_reshape_4[0][0]              
                                                                 loc_reshape_5[0][0]              
__________________________________________________________________________________________________
mbox_priorbox (Concatenate)     (32, 12828, 1, 8)    0           anchor_reshape_0[0][0]           
                                                                 anchor_reshape_1[0][0]           
                                                                 anchor_reshape_2[0][0]           
                                                                 anchor_reshape_3[0][0]           
                                                                 anchor_reshape_4[0][0]           
                                                                 anchor_reshape_5[0][0]           
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (32, 12828, 1, 14)   0           mbox_conf_sigmoid[0][0]          
                                                                 mbox_loc[0][0]                   
                                                                 mbox_priorbox[0][0]              
__________________________________________________________________________________________________
ssd_predictions (Reshape)       (32, 12828, 14)      0           concatenate_2[0][0]              
==================================================================================================
Total params: 1,587,864
Trainable params: 1,586,840
Non-trainable params: 1,024
__________________________________________________________________________________________________
2020-07-10 11:17:48,602 [INFO] iva.ssd.scripts.train: Number of images in the training dataset:	 18592
2020-07-10 11:17:48,603 [INFO] iva.ssd.scripts.train: Number of images in the validation dataset:	  3280
Epoch 1/30

 1/72 [..............................] - ETA: 47:04 - loss: 1687.9543
.
.
.
72/72 [==============================] - 129s 2s/step - loss: 34.6049

Epoch 00001: saving model to /project/trained_model/weights/ssd_squeezenet_epoch_001.tlt
Epoch 2/30

 1/72 [..............................] - ETA: 24s - loss: 5.8256
.
.
.
72/72 [==============================] - 88s 1s/step - loss: 5.4907

Epoch 00002: saving model to /project/trained_model/weights/ssd_squeezenet_epoch_002.tlt
Epoch 3/30

 1/72 [..............................] - ETA: 1:13 - loss: 6.3041
.
.
.
70/72 [============================>.] - ETA: 2s - loss: nan                            Batch 69: Invalid loss, terminating training
Batch 69: Invalid loss, terminating training
Batch 69: Invalid loss, terminating training
Batch 69: Invalid loss, terminating training
Batch 69: Invalid loss, terminating training
Batch 69: Invalid loss, terminating training
Batch 69: Invalid loss, terminating training
Batch 69: Invalid loss, terminating training

Epoch 00003: saving model to /project/trained_model/weights/ssd_squeezenet_epoch_003.tlt

The training is terminated automatically, please help.
Thanks

For nan loss, please set smaller bs and retry.

Hi Morghan,

I tried with a smaller batchsize but that didn’t helped. also if i use other input size for example 300x300 then it gets terminated at epoch 1 only.

Lastly I tried with other dataset having 150K images, with BS 32 and 16 but I am facing the same issue the training is terminated at 176th epoch.

835/874 [===========================>..] - ETA: 7s - loss: 9.4494
836/874 [===========================>..] - ETA: 7s - loss: 283.5117
837/874 [===========================>..] - ETA: 7s - loss: 86278259695982480.0000
838/874 [===========================>..] - ETA: 7s - loss: nan                   Batch 837: Invalid loss, terminating training
Batch 837: Invalid loss, terminating training
Batch 837: Invalid loss, terminating training
Batch 837: Invalid loss, terminating training
Batch 837: Invalid loss, terminating training
Batch 837: Invalid loss, terminating training
Batch 837: Invalid loss, terminating training
Batch 837: Invalid loss, terminating training

Epoch 00176: saving model to /project/trained_model/weights/ssd_squeezenet_epoch_176.tlt

Can you please suggest what input size, batch size, learning rate combination will work for the SqueezeNet.

Could you please try a smaller bs, for example, 4?

Sure I’ll try Batch size 4 and get back to you.