CostFunctionConfig should have at least one class

Hello,

I am following this blog Training the PeopleNet model from https://devblogs.nvidia.com/training-custom-pretrained-models-using-tlt/

I am using the dataset from http://www.cbsr.ia.ac.cn/users/sfzhang/WiderPerson
I resized the images to uniform resolution and converted the labels into kitti format.

All the steps went smoothly, and the error started at the training command.

!tlt-train detectnet_v2 -e specs/train.txt -r trained_model -k $API_KEY --gpus 1

Error

2020-05-01 15:33:35,102 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at specs/train.txt.
2020-05-01 15:33:35,106 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from specs/train.txt
2020-05-01 15:33:35,214 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 7644 samples with a batch size of 24; each epoch will therefore take one extra step.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 47, in main
  File "<decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 667, in main
  File "./detectnet_v2/scripts/train.py", line 591, in run_experiment
  File "./detectnet_v2/scripts/train.py", line 469, in train_gridbox
  File "./detectnet_v2/cost_function/cost_auto_weight_hook.py", line 29, in build_cost_auto_weight_hook
ValueError: CostFunctionConfig should have at least one class

Here is the spec file created using steps from the blog:

dataset_config {
data_sources: {
  tfrecords_path: "/nitin/tlt-workspace/people-net/tf_records/*"
  image_directory_path: "/nitin/tlt-workspace/people-net/dataset"
}
image_extension: "jpg"
target_class_mapping {
  key: "pedestrians"
  value: "pedestrians"
}
target_class_mapping {
  key: "riders"
  value: "riders"
}
target_class_mapping {
  key: "crowd"
  value: "crowd"
}
 validation_fold: 0
}

model_config {
  pretrained_model_file: "/nitin/tlt-workspace/people-net/pretrained_weights/tlt_peoplenet_vunpruned_v1.0/resnet34_peoplenet.tlt"
  num_layers: 34
  freeze_blocks: 0
  arch: "resnet"
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
}

training_config {
batch_size_per_gpu: 24
num_epochs: 12
learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 5e-06
    max_learning_rate: 0.0005
    soft_start: 0.1
    annealing: 0.7
  }
}

regularizer {
  type: L1
  weight: 3e-09
}

optimizer {
  adam {
    epsilon: 9.9e-09
    beta1: 0.9
    beta2: 0.999
  }
}


cost_scaling {
  initial_exponent: 20.0
  increment: 0.005
  decrement: 1.0
}
checkpoint_interval: 10
}

augmentation_config {
 preprocessing {
 output_image_width: 700
 output_image_height: 500
 output_image_channel: 3
 crop_right: 700
 crop_bottom: 500
 min_bbox_width: 1.0
 min_bbox_height: 1.0
 }
 spatial_augmentation {
   hflip_probability: 0.5
   zoom_min: 1.0
   zoom_max: 1.0
   translate_max_x: 8.0
   translate_max_y: 8.0
 }
 color_augmentation {
   hue_rotation_max: 25.0
   saturation_shift_max: 0.20000000298
   contrast_scale_max: 0.10000000149
   contrast_center: 0.5
 }
}

postprocessing_config{
 target_class_config{
   key: "pedestrians"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.265
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 4
     }
   }
 }
 target_class_config{
   key: "riders"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.15
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 4
     }
   }
 }
 target_class_config{
   key: "crowd"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.15
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 2
     }
   }
 }
}

As per my previous successful training spec file, cost_function_config is required for the training which is also mentioned on the blog but there’s no such configuration being shared.
so I tried to put my own cost_function_config from other previously used spec file:

cost_function_config {
  target_classes {
    name: "pedestrians"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  
  target_classes {
    name: "riders"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  
  target_classes {
    name: "crowd"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  
  enable_autoweighting: True
  max_objective_weight: 0.9999
  min_objective_weight: 0.0001
}

Now the error says:

2020-05-01 15:46:23,854 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at specs/train.txt.
2020-05-01 15:46:23,855 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from specs/train.txt
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 47, in main
  File "<decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 667, in main
  File "./detectnet_v2/scripts/train.py", line 561, in run_experiment
  File "./detectnet_v2/spec_handler/spec_loader.py", line 70, in load_experiment_spec
  File "./detectnet_v2/spec_handler/spec_loader.py", line 50, in load_proto
  File "./detectnet_v2/spec_handler/spec_loader.py", line 36, in _load_from_file
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 702, in Merge
    allow_unknown_field=allow_unknown_field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 770, in MergeLines
    return parser.MergeLines(lines, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 795, in MergeLines
    self._ParseOrMerge(lines, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 817, in _ParseOrMerge
    self._MergeField(tokenizer, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 942, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1016, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 909, in _MergeField
    (message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 67:1 : Message type "TrainingConfig" has no field named "cost_function_config".

I am unable to understand what is the actual error.
Please help on the problem.

Thanks

And also please add “box_rasterizer_config” part into your spec.
Please attach your latest full spec here. Thanks.

Hi @Morganh sorry for my late reply,
I updated my spec file with “box_rasterizer_config” but it still didn’t helped.

As the error says "TrainingConfig" has no field named "cost_function_config" so I found the "cost_function_config" from original notebook provided with TLT container and added it to my spec file.
please note:- "cost_function_config" is not mentioned anywhere on the original blog that I followed.

After all of changes, this is what tlt-train command shows:

ValueError: Cannot find a min overlap threshold for pedestrians

I guess the old problem is resolved, can you please suggest anything about this one.

Here is my full training my spec file:

dataset_config {
data_sources: {
  tfrecords_path: "/nitin/tlt-workspace/people-net/tf_records/*"
  image_directory_path: "/nitin/tlt-workspace/people-net/dataset"
}
image_extension: "jpg"
target_class_mapping {
  key: "pedestrians"
  value: "pedestrians"
}
target_class_mapping {
  key: "riders"
  value: "riders"
}
target_class_mapping {
  key: "crowd"
  value: "crowd"
}
 validation_fold: 0
}

model_config {
  pretrained_model_file: "/nitin/tlt-workspace/people-net/pretrained_weights/tlt_peoplenet_vunpruned_v1.0/resnet34_peoplenet.tlt"
  num_layers: 34
  freeze_blocks: 0
  arch: "resnet"
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
}

cost_function_config {
  target_classes {
    name: "pedestrians"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "riders"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "crowd"
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}

training_config {
batch_size_per_gpu: 24
num_epochs: 12
learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 5e-06
    max_learning_rate: 0.0005
    soft_start: 0.1
    annealing: 0.7
  }
}
regularizer {
  type: L1
  weight: 3e-09
}
optimizer {
  adam {
    epsilon: 9.9e-09
    beta1: 0.9
    beta2: 0.999
  }
}
cost_scaling {
  initial_exponent: 20.0
  increment: 0.005
  decrement: 1.0
}
checkpoint_interval: 10
}

augmentation_config {
 preprocessing {
 output_image_width: 700
 output_image_height: 500
 output_image_channel: 3
 crop_right: 700
 crop_bottom: 500
 min_bbox_width: 1.0
 min_bbox_height: 1.0
 }
 spatial_augmentation {
   hflip_probability: 0.5
   zoom_min: 1.0
   zoom_max: 1.0
   translate_max_x: 8.0
   translate_max_y: 8.0
 }
 color_augmentation {
   hue_rotation_max: 25.0
   saturation_shift_max: 0.20000000298
   contrast_scale_max: 0.10000000149
   contrast_center: 0.5
 }
}

postprocessing_config{
 target_class_config{
   key: "pedestrians"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.265
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 4
     }
   }
 }
 target_class_config{
   key: "riders"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.15
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 4
     }
   }
 }
 target_class_config{
   key: "crowd"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.15
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 2
     }
   }
 }
}

bbox_rasterizer_config {
  target_class_config {
    key: "pedestrians"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "riders"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "crowd"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}

One more thing i noticed, since I am using a pretrained ready to use weight file i.e, Peoplenet weight file using my API key it said

The key used to load the model is incorrect

I checked it on ngc cloud and figured out it should be tlt_encode. I get it but, once I use that key to re-train model on custom dataset how do I use my own API key later?

I figured out, evaluation_config was missing on my spec file i have added it now.

latest error:

Using TensorFlow backend.
--------------------------------------------------------------------------
[[50944,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: tlt

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
2020-05-05 11:48:10,029 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at specs/train.txt.
2020-05-05 11:48:10,032 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from specs/train.txt
2020-05-05 11:48:10,147 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 7644 samples with a batch size of 24; each epoch will therefore take one extra step.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 3, 500, 700)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 250, 350) 9472        input_1[0][0]                    
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 250, 350) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 250, 350) 0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 125, 175) 36928       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 125, 175) 256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (None, 64, 125, 175) 0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 125, 175) 36928       block_1a_relu_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 125, 175) 4160        activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 125, 175) 256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 125, 175) 256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 125, 175) 0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1a_relu (Activation)      (None, 64, 125, 175) 0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 125, 175) 36928       block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 125, 175) 256         block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (None, 64, 125, 175) 0           block_1b_bn_1[0][0]              
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 125, 175) 36928       block_1b_relu_1[0][0]            
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 125, 175) 256         block_1b_conv_2[0][0]            
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 125, 175) 0           block_1b_bn_2[0][0]              
                                                                 block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_relu (Activation)      (None, 64, 125, 175) 0           add_2[0][0]                      
__________________________________________________________________________________________________
block_1c_conv_1 (Conv2D)        (None, 64, 125, 175) 36928       block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_1c_bn_1 (BatchNormalizati (None, 64, 125, 175) 256         block_1c_conv_1[0][0]            
__________________________________________________________________________________________________
block_1c_relu_1 (Activation)    (None, 64, 125, 175) 0           block_1c_bn_1[0][0]              
__________________________________________________________________________________________________
block_1c_conv_2 (Conv2D)        (None, 64, 125, 175) 36928       block_1c_relu_1[0][0]            
__________________________________________________________________________________________________
block_1c_bn_2 (BatchNormalizati (None, 64, 125, 175) 256         block_1c_conv_2[0][0]            
__________________________________________________________________________________________________
add_3 (Add)                     (None, 64, 125, 175) 0           block_1c_bn_2[0][0]              
                                                                 block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_1c_relu (Activation)      (None, 64, 125, 175) 0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 63, 88)  73856       block_1c_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 63, 88)  512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (None, 128, 63, 88)  0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 63, 88)  147584      block_2a_relu_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 63, 88)  8320        block_1c_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 63, 88)  512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 63, 88)  512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 63, 88)  0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2a_relu (Activation)      (None, 128, 63, 88)  0           add_4[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 63, 88)  147584      block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 63, 88)  512         block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (None, 128, 63, 88)  0           block_2b_bn_1[0][0]              
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 63, 88)  147584      block_2b_relu_1[0][0]            
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 63, 88)  512         block_2b_conv_2[0][0]            
__________________________________________________________________________________________________
add_5 (Add)                     (None, 128, 63, 88)  0           block_2b_bn_2[0][0]              
                                                                 block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_relu (Activation)      (None, 128, 63, 88)  0           add_5[0][0]                      
__________________________________________________________________________________________________
block_2c_conv_1 (Conv2D)        (None, 128, 63, 88)  147584      block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_2c_bn_1 (BatchNormalizati (None, 128, 63, 88)  512         block_2c_conv_1[0][0]            
__________________________________________________________________________________________________
block_2c_relu_1 (Activation)    (None, 128, 63, 88)  0           block_2c_bn_1[0][0]              
__________________________________________________________________________________________________
block_2c_conv_2 (Conv2D)        (None, 128, 63, 88)  147584      block_2c_relu_1[0][0]            
__________________________________________________________________________________________________
block_2c_bn_2 (BatchNormalizati (None, 128, 63, 88)  512         block_2c_conv_2[0][0]            
__________________________________________________________________________________________________
add_6 (Add)                     (None, 128, 63, 88)  0           block_2c_bn_2[0][0]              
                                                                 block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_2c_relu (Activation)      (None, 128, 63, 88)  0           add_6[0][0]                      
__________________________________________________________________________________________________
block_2d_conv_1 (Conv2D)        (None, 128, 63, 88)  147584      block_2c_relu[0][0]              
__________________________________________________________________________________________________
block_2d_bn_1 (BatchNormalizati (None, 128, 63, 88)  512         block_2d_conv_1[0][0]            
__________________________________________________________________________________________________
block_2d_relu_1 (Activation)    (None, 128, 63, 88)  0           block_2d_bn_1[0][0]              
__________________________________________________________________________________________________
block_2d_conv_2 (Conv2D)        (None, 128, 63, 88)  147584      block_2d_relu_1[0][0]            
__________________________________________________________________________________________________
block_2d_bn_2 (BatchNormalizati (None, 128, 63, 88)  512         block_2d_conv_2[0][0]            
__________________________________________________________________________________________________
add_7 (Add)                     (None, 128, 63, 88)  0           block_2d_bn_2[0][0]              
                                                                 block_2c_relu[0][0]              
__________________________________________________________________________________________________
block_2d_relu (Activation)      (None, 128, 63, 88)  0           add_7[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 32, 44)  295168      block_2d_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 32, 44)  1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (None, 256, 32, 44)  0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 32, 44)  590080      block_3a_relu_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 32, 44)  33024       block_2d_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 32, 44)  1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 32, 44)  1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_8 (Add)                     (None, 256, 32, 44)  0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3a_relu (Activation)      (None, 256, 32, 44)  0           add_8[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 32, 44)  590080      block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 32, 44)  1024        block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (None, 256, 32, 44)  0           block_3b_bn_1[0][0]              
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 32, 44)  590080      block_3b_relu_1[0][0]            
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 32, 44)  1024        block_3b_conv_2[0][0]            
__________________________________________________________________________________________________
add_9 (Add)                     (None, 256, 32, 44)  0           block_3b_bn_2[0][0]              
                                                                 block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_relu (Activation)      (None, 256, 32, 44)  0           add_9[0][0]                      
__________________________________________________________________________________________________
block_3c_conv_1 (Conv2D)        (None, 256, 32, 44)  590080      block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_3c_bn_1 (BatchNormalizati (None, 256, 32, 44)  1024        block_3c_conv_1[0][0]            
__________________________________________________________________________________________________
block_3c_relu_1 (Activation)    (None, 256, 32, 44)  0           block_3c_bn_1[0][0]              
__________________________________________________________________________________________________
block_3c_conv_2 (Conv2D)        (None, 256, 32, 44)  590080      block_3c_relu_1[0][0]            
__________________________________________________________________________________________________
block_3c_bn_2 (BatchNormalizati (None, 256, 32, 44)  1024        block_3c_conv_2[0][0]            
__________________________________________________________________________________________________
add_10 (Add)                    (None, 256, 32, 44)  0           block_3c_bn_2[0][0]              
                                                                 block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_3c_relu (Activation)      (None, 256, 32, 44)  0           add_10[0][0]                     
__________________________________________________________________________________________________
block_3d_conv_1 (Conv2D)        (None, 256, 32, 44)  590080      block_3c_relu[0][0]              
__________________________________________________________________________________________________
block_3d_bn_1 (BatchNormalizati (None, 256, 32, 44)  1024        block_3d_conv_1[0][0]            
__________________________________________________________________________________________________
block_3d_relu_1 (Activation)    (None, 256, 32, 44)  0           block_3d_bn_1[0][0]              
__________________________________________________________________________________________________
block_3d_conv_2 (Conv2D)        (None, 256, 32, 44)  590080      block_3d_relu_1[0][0]            
__________________________________________________________________________________________________
block_3d_bn_2 (BatchNormalizati (None, 256, 32, 44)  1024        block_3d_conv_2[0][0]            
__________________________________________________________________________________________________
add_11 (Add)                    (None, 256, 32, 44)  0           block_3d_bn_2[0][0]              
                                                                 block_3c_relu[0][0]              
__________________________________________________________________________________________________
block_3d_relu (Activation)      (None, 256, 32, 44)  0           add_11[0][0]                     
__________________________________________________________________________________________________
block_3e_conv_1 (Conv2D)        (None, 256, 32, 44)  590080      block_3d_relu[0][0]              
__________________________________________________________________________________________________
block_3e_bn_1 (BatchNormalizati (None, 256, 32, 44)  1024        block_3e_conv_1[0][0]            
__________________________________________________________________________________________________
block_3e_relu_1 (Activation)    (None, 256, 32, 44)  0           block_3e_bn_1[0][0]              
__________________________________________________________________________________________________
block_3e_conv_2 (Conv2D)        (None, 256, 32, 44)  590080      block_3e_relu_1[0][0]            
__________________________________________________________________________________________________
block_3e_bn_2 (BatchNormalizati (None, 256, 32, 44)  1024        block_3e_conv_2[0][0]            
__________________________________________________________________________________________________
add_12 (Add)                    (None, 256, 32, 44)  0           block_3e_bn_2[0][0]              
                                                                 block_3d_relu[0][0]              
__________________________________________________________________________________________________
block_3e_relu (Activation)      (None, 256, 32, 44)  0           add_12[0][0]                     
__________________________________________________________________________________________________
block_3f_conv_1 (Conv2D)        (None, 256, 32, 44)  590080      block_3e_relu[0][0]              
__________________________________________________________________________________________________
block_3f_bn_1 (BatchNormalizati (None, 256, 32, 44)  1024        block_3f_conv_1[0][0]            
__________________________________________________________________________________________________
block_3f_relu_1 (Activation)    (None, 256, 32, 44)  0           block_3f_bn_1[0][0]              
__________________________________________________________________________________________________
block_3f_conv_2 (Conv2D)        (None, 256, 32, 44)  590080      block_3f_relu_1[0][0]            
__________________________________________________________________________________________________
block_3f_bn_2 (BatchNormalizati (None, 256, 32, 44)  1024        block_3f_conv_2[0][0]            
__________________________________________________________________________________________________
add_13 (Add)                    (None, 256, 32, 44)  0           block_3f_bn_2[0][0]              
                                                                 block_3e_relu[0][0]              
__________________________________________________________________________________________________
block_3f_relu (Activation)      (None, 256, 32, 44)  0           add_13[0][0]                     
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 32, 44)  1180160     block_3f_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 32, 44)  2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (None, 512, 32, 44)  0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 32, 44)  2359808     block_4a_relu_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 32, 44)  131584      block_3f_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 32, 44)  2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 32, 44)  2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_14 (Add)                    (None, 512, 32, 44)  0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4a_relu (Activation)      (None, 512, 32, 44)  0           add_14[0][0]                     
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 32, 44)  2359808     block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 32, 44)  2048        block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (None, 512, 32, 44)  0           block_4b_bn_1[0][0]              
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 32, 44)  2359808     block_4b_relu_1[0][0]            
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 32, 44)  2048        block_4b_conv_2[0][0]            
__________________________________________________________________________________________________
add_15 (Add)                    (None, 512, 32, 44)  0           block_4b_bn_2[0][0]              
                                                                 block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_relu (Activation)      (None, 512, 32, 44)  0           add_15[0][0]                     
__________________________________________________________________________________________________
block_4c_conv_1 (Conv2D)        (None, 512, 32, 44)  2359808     block_4b_relu[0][0]              
__________________________________________________________________________________________________
block_4c_bn_1 (BatchNormalizati (None, 512, 32, 44)  2048        block_4c_conv_1[0][0]            
__________________________________________________________________________________________________
block_4c_relu_1 (Activation)    (None, 512, 32, 44)  0           block_4c_bn_1[0][0]              
__________________________________________________________________________________________________
block_4c_conv_2 (Conv2D)        (None, 512, 32, 44)  2359808     block_4c_relu_1[0][0]            
__________________________________________________________________________________________________
block_4c_bn_2 (BatchNormalizati (None, 512, 32, 44)  2048        block_4c_conv_2[0][0]            
__________________________________________________________________________________________________
add_16 (Add)                    (None, 512, 32, 44)  0           block_4c_bn_2[0][0]              
                                                                 block_4b_relu[0][0]              
__________________________________________________________________________________________________
block_4c_relu (Activation)      (None, 512, 32, 44)  0           add_16[0][0]                     
__________________________________________________________________________________________________
output_bbox (Conv2D)            (None, 12, 32, 44)   6156        block_4c_relu[0][0]              
__________________________________________________________________________________________________
output_cov (Conv2D)             (None, 3, 32, 44)    1539        block_4c_relu[0][0]              
==================================================================================================
Total params: 21,322,319
Trainable params: 21,295,695
Non-trainable params: 26,624
__________________________________________________________________________________________________
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-05-05 11:49:06,517 [INFO] iva.detectnet_v2.scripts.train: Found 7644 samples in training set
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-05-05 11:49:37,144 [INFO] iva.detectnet_v2.scripts.train: Found 1348 samples in validation set
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 47, in main
  File "<decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 667, in main
  File "./detectnet_v2/scripts/train.py", line 591, in run_experiment
  File "./detectnet_v2/scripts/train.py", line 525, in train_gridbox
  File "./detectnet_v2/scripts/train.py", line 142, in run_training_loop
  File "./detectnet_v2/training/utilities.py", line 143, in get_singular_monitored_session
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1021, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 650, in __init__
    self._sess = self._coordinated_creator.create_session()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 812, in create_session
    hook.after_create_session(self.tf_sess, self.coord)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 568, in after_create_session
    self._save(session, global_step)
  File "./detectnet_v2/tfhooks/checkpoint_saver_hook.py", line 77, in _save
  File "./detectnet_v2/tfhooks/checkpoint_saver_hook.py", line 110, in _save_encrypted_checkpoint
IOError: [Errno 2] No such file or directory: 'trained_model/model.step-0.ckzip'

latest training spec file:

dataset_config {
data_sources: {
  tfrecords_path: "/nitin/tlt-workspace/people-net/tf_records/*"
  image_directory_path: "/nitin/tlt-workspace/people-net/dataset"
}
image_extension: "jpg"
target_class_mapping {
  key: "pedestrians"
  value: "pedestrians"
}
target_class_mapping {
  key: "riders"
  value: "riders"
}
target_class_mapping {
  key: "crowd"
  value: "crowd"
}
 validation_fold: 0
}

model_config {
  pretrained_model_file: "/nitin/tlt-workspace/people-net/pretrained_weights/tlt_peoplenet_vunpruned_v1.0/resnet34_peoplenet.tlt"
  num_layers: 34
  freeze_blocks: 0
  arch: "resnet"
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
}

cost_function_config {
  target_classes {
    name: "pedestrians"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "riders"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "crowd"
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}

training_config {
batch_size_per_gpu: 24
num_epochs: 12
learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 5e-06
    max_learning_rate: 0.0005
    soft_start: 0.1
    annealing: 0.7
  }
}
regularizer {
  type: L1
  weight: 3e-09
}
optimizer {
  adam {
    epsilon: 9.9e-09
    beta1: 0.9
    beta2: 0.999
  }
}
cost_scaling {
  initial_exponent: 20.0
  increment: 0.005
  decrement: 1.0
}
checkpoint_interval: 10
}

augmentation_config {
 preprocessing {
 output_image_width: 700
 output_image_height: 500
 output_image_channel: 3
 min_bbox_width: 1.0
 min_bbox_height: 1.0
 }
 spatial_augmentation {
   hflip_probability: 0.5
   zoom_min: 1.0
   zoom_max: 1.0
   translate_max_x: 8.0
   translate_max_y: 8.0
 }
 color_augmentation {
   hue_rotation_max: 25.0
   saturation_shift_max: 0.20000000298
   contrast_scale_max: 0.10000000149
   contrast_center: 0.5
 }
}

postprocessing_config{
 target_class_config{
   key: "pedestrians"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.265
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 4
     }
   }
 }
 target_class_config{
   key: "riders"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.15
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 4
     }
   }
 }
 target_class_config{
   key: "crowd"
   value: {
     clustering_config {
       coverage_threshold: 0.005
       dbscan_eps: 0.15
       dbscan_min_samples: 0.05
       minimum_bounding_box_height: 2
     }
   }
 }
}

bbox_rasterizer_config {
  target_class_config {
    key: "pedestrians"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "riders"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "crowd"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}


evaluation_config {
 validation_period_during_training: 10
 first_validation_epoch: 1
 minimum_detection_ground_truth_overlap {
   key: "pedestrians"
   value: 0.5
 }
 minimum_detection_ground_truth_overlap {
   key: "riders"
   value: 0.5
 }
 minimum_detection_ground_truth_overlap {
   key: "crowd"
   value: 0.5
 }
 evaluation_box_config {
   key: "pedestrians"
   value {
     minimum_height: 20
     maximum_height: 9999
     minimum_width: 4
     maximum_width: 9999
   }
 }
 evaluation_box_config {
   key: "riders"
   value {
     minimum_height: 2
     maximum_height: 9999
     minimum_width: 2
     maximum_width: 9999
   }
 }
 evaluation_box_config {
   key: "crowd"
   value {
     minimum_height: 40
     maximum_height: 9999
     minimum_width: 4
     maximum_width: 9999
   }
 }
}

For your latest error, please remove your previous result folder.
$ rm -rf trained_model

For you previous question.
" I checked it on ngc cloud and figured out it should be tlt_encode . I get it but, once I use that key to re-train model on custom dataset how do I use my own API key later?"

Yes, tlt_encode is always needed if you use the peoplenet pretrained model.

For your latest error, please remove your previous result folder.
$ rm -rf trained_model

I already tried that couple of times, it ends with the same error.

This is the content of the trained_model directory before removing:

ls -l trained_model

total 103M
drwxr-xr-x 2 root root 4.0K May  6 02:04 weights
-rw-r--r-- 1 root root 5.1K May  6 02:04 experiment_spec.txt
-rw-r--r-- 1 root root  47M May  6 02:07 graph.pbtxt
-rw-r--r-- 1 root root  57M May  6 02:07 events.out.tfevents.1588730760.tlt

Please set the absolute path for the result folder. I recall I meet similar error too.

Setting absolute path for the result folder seems to resolve my issue, now the training has started finally.

Thanks for your time and patience.