0 mean average precision for all the objects when training using detectnetV2 on custom dataset annotated in COCO format

• Hardware - Training on 3080
• Network Type Detectnet_v2
• Training spec file

random_seed: 7
model_config {
  pretrained_model_file: "tao-experiments/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet_18.hdf5"
  num_layers: 18
  use_batch_norm: true
  all_projections: True
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    cov {
  arch: "resnet"

bbox_rasterizer_config {
  target_class_config {
    key: "menu"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.8
      cov_radius_y: 0.8
      bbox_min_radius: 1.0
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.8
      cov_radius_y: 0.8
      bbox_min_radius: 1.0
  target_class_config {
    key: "table"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.8
      cov_radius_y: 0.8
      bbox_min_radius: 1.0
  deadzone_radius: 0.67

postprocessing_config {
  target_class_config {
    key: "menu"
    value {
      clustering_config {
        clustering_algorithm: NMS
        minimum_bounding_box_height: 10
  target_class_config {
    key: "person"
    value {
      clustering_config {
        clustering_algorithm: NMS
        minimum_bounding_box_height: 10
  target_class_config {
    key: "table"
    value {
      clustering_config {
        clustering_algorithm: NMS
        minimum_bounding_box_height: 10

cost_function_config {
  target_classes {
    name: "menu"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
  target_classes {
    name: "table"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
  enable_autoweighting: True
  max_objective_weight: 0.9999
  min_objective_weight: 0.0001

training_config {
  batch_size_per_gpu: 4
    num_epochs: 120
    learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-6
      max_learning_rate: 5e-4
      soft_start: 0.10000000149
      annealing: 0.699999988079
  regularizer {
    type: L1
    weight: 3e-9
  optimizer {
    adam {
      epsilon: 1e-08
      beta1: 0.9
      beta2: 0.999
  cost_scaling {
    enabled: False
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  visualizer {
    enabled: true
    num_images: 3
    scalar_logging_frequency: 10
    infrequent_logging_frequency: 1
    target_class_config {
      key: "menu"
      value: {
        coverage_threshold: 0.005
    target_class_config {
      key: "person"
      value: {
        coverage_threshold: 0.005
    target_class_config {
      key: "table"
      value: {
        coverage_threshold: 0.005
  checkpoint_interval: 10

augmentation_config {
  preprocessing {
    output_image_width: 960
    output_image_height: 544
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5

evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 10
  minimum_detection_ground_truth_overlap {
    key: "menu"
    value: 0.5
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.5
  minimum_detection_ground_truth_overlap {
    key: "table"
    value: 0.5
  evaluation_box_config {
    key: "menu"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
  evaluation_box_config {
    key: "table"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
  average_precision_mode: INTEGRATE

dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/coco_trainval/*256"
    image_directory_path: "/workspace/tao-experiments/data/coco"
  image_extension: "png"
  target_class_mapping {
    key: "menu"
    value: "menu"
  target_class_mapping {
    key: "person"
    value: "person"
  target_class_mapping {
    key: "table"
    value: "table"
  validation_data_source: {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/coco_trainval/*32"
    image_directory_path: "/workspace/tao-experiments/data/coco"

• How to reproduce the issue ? - No errors, training job succeeds but the precision is really bad

The specs of file I use to get TFrecords:

coco_config {
  root_directory_path: "/workspace/tao-experiments/data/coco/"
  img_dir_names: ["val", "train"]
  annotation_files: ["annotations/instances_val.json", "annotations/instances_train.json"]
  num_partitions: 2
  num_shards: [32, 256]
image_directory_path: "/workspace/tao-experiments/data/coco"

I have tried with DBSCAN clustering algorithm, used resnet34, darknet as backbone, tried freezing layers as well but still the numbers are always 0.

Are your training images the same resolution? If not, please set enable_auto_resize: true in the training config file.

Yes all images are 960 * 544.

Can you share an example of label file?

{'images': [{'id': 1, 'file_name': 'image_1.png', 'width': 960, 'height': 544}], 'annotations': [{'id': 1, \
'image_id': 1, 'category_id': 1, 'bbox': [277.5818176269531, 43.28742218017578, \
91.56036376953125, 77.64122772216797], 'area': 7108.859053754713, 'iscrowd': 0}, {'id': 2, \
'image_id': 1, 'category_id': 4, 'bbox': [508.2998046875, 330.1967468261719, 208.7877197265625, \
148.58041381835938], 'area': 31021.76579716429, 'iscrowd': 0}, {'id': 3, 'image_id': 1, 'category_id': \
3, 'bbox': [277.5818176269531, 33.28742218017578, 91.56036376953125, 87.64122772216797], \
'area': 8024.462691450026, 'iscrowd': 0}, {'id': 4, 'image_id': 1, 'category_id': 4, 'bbox': \
[388.3735046386719, 222.57691955566406, 141.39865112304688, 125.56895446777344], \
'area': 17755.280784674454, 'iscrowd': 0}, {'id': 5, 'image_id': 1, 'category_id': 4, 'bbox': \
[552.5770874023438, 101.79374694824219, 119.85699462890625, 67.67396545410156], 'area': \
8111.198113949038, 'iscrowd': 0}, {'id': 6, 'image_id': 1, 'category_id': 4, 'bbox': \
[772.185302734375, 228.51258850097656, 144.8861083984375, 133.2353057861328], 'area': \
19303.944956628606, 'iscrowd': 0}, {'id': 7, 'image_id': 1, 'category_id': 4, 'bbox': \
[773.3126220703125, 218.7852783203125, 141.8604736328125, 77.67727661132812], 'area': \
11019.335250589997, 'iscrowd': 0}, {'id': 8, 'image_id': 1, 'category_id': 4, 'bbox': \
[626.0317993164062, 169.75228881835938, 100.63037109375, 40.3277587890625], 'area': \
4058.1973323225975, 'iscrowd': 0}, {'id': 9, 'image_id': 1, 'category_id': 4, 'bbox': \
[101.9083023071289, 292.6826171875, 142.3033676147461, 80.18209838867188], 'area': \
11410.182623124914, 'iscrowd': 0}, {'id': 10, 'image_id': 1, 'category_id': 4, 'bbox': \
[392.6964111328125, 130.7268829345703, 125.06402587890625, 79.88526916503906], 'area': \
9990.773370199837, 'iscrowd': 0}, {'id': 11, 'image_id': 1, 'category_id': 4, 'bbox': \
[751.57861328125, 132.51092529296875, 75.32354736328125, 71.47303771972656], 'area': \
5383.602741879411, 'iscrowd': 0}, {'id': 12, 'image_id': 1, 'category_id': 4, 'bbox': \
[101.2449951171875, 281.990966796875, 146.65621948242188, 143.36270141601562], 'area': \
21025.0318044601, 'iscrowd': 0}, {'id': 13, 'image_id': 1, 'category_id': 4, 'bbox': \
[136.01014709472656, 474.55255126953125, 208.72425842285156, 68.16131591796875], \
'area': 14226.920118103735, 'iscrowd': 0}, {'id': 14, 'image_id': 1, 'category_id': 2, 'bbox': 
[0.2716605067253113, 145.5406036376953, 38.76214152574539, 62.01707458496094], 'area': \
2403.9146220749635, 'iscrowd': 0}, {'id': 15, 'image_id': 1, 'category_id': 4, 'bbox': \
[389.3913879394531, 213.2857666015625, 139.49581909179688, 77.1763916015625], 'area': \
10765.783961009234, 'iscrowd': 0}, {'id': 16, 'image_id': 1, 'category_id': 4, 'bbox': [625.990234375, \
159.3264617919922, 102.0302734375, 105.97669982910156], 'area': 10812.831661567092, \
'iscrowd': 0}], 'categories': [{'id': 1, 'name': 'menu'}, {'id': 2, 'name': 'plate'}, {'id': 3, 'name': 'check'}, \
{'id': 4, 'name': 'table'}]}

This is the label for one image, its a dictionary having images, annotations and categories.

Any update on this?

There is not person your dataset. Could you remove person in the spec file and retry? More, please share the log when you run tao detectnet_v2 dataset_convert.

For this particular frame, there may not have been a person detected, but the final json that I am using (annotated json for all my images) has all the 5 categories. I was trying to minimize the classes in the spec file to see the results, but even if i include all the 5 classes in the spec file that match the categories in the corresponding json, I still get 0 precision for all classes.

Could you please share the the log when you run tao detectnet_v2 dataset_convert and training? More, could you double check if follow the format mentioned in Data Annotation Format - NVIDIA Docs ?
For example,
“categories”: [{“supercategory”: “person”,“id”: 1,“name”: “person”},{“supercategory”: “vehicle”,“id”: 2,“name”: “bicycle”},{“supercategory”: “vehicle”,“id”: 3,“name”: “car”},{“supercategory”: “vehicle”,“id”: 4,“name”: “motorcycle”}]

The log of tao detectnet_v2 dataset_convert

Converting Tfrecords for COCO trainval dataset
2023-05-03 14:38:46,225 [INFO] root: Registry: ['nvcr.io']
2023-05-03 14:38:46,305 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
2023-05-03 14:38:47,236 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/shounak/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
Using TensorFlow backend.
2023-05-03 18:38:48.758429: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!

sing TensorFlow backend.
2023-05-03 18:38:55,886 [INFO] iva.detectnet_v2.dataio.build_converter: Instantiating a coco converter
2023-05-03 18:38:55,886 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: Creating output directory /workspace/tao-experiments/data/tfrecords/coco_trainval
loading annotations into memory...
Done (t=0.17s)
creating index...
index created!
loading annotations into memory...
Done (t=0.50s)
creating index...
index created!
2023-05-03 18:38:57,178 [INFO] iva.detectnet_v2.dataio.coco_converter_lib: Writing partition 0, shard 31
2023-05-03 18:38:57,207 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: 
Wrote the following numbers of objects:
b'menu': 2826
b'table': 12550
b'check': 1896
b'plate': 4397
b'person': 18264

2023-05-03 18:38:58,992 [INFO] iva.detectnet_v2.dataio.coco_converter_lib: Cumulative object statistics
2023-05-03 18:38:58,992 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: 
Wrote the following numbers of objects:
b'menu': 10543
b'table': 66633
b'check': 8206
b'plate': 20348
b'person': 63896

2023-05-03 18:38:58,992 [INFO] iva.detectnet_v2.dataio.coco_converter_lib: Class map. 

Label in GT: Label in tfrecords file 
menu: menu
table: table
check: check
plate: plate
person: person
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

2023-05-03 18:38:58,992 [INFO] iva.detectnet_v2.dataio.coco_converter_lib: Tfrecords generation complete.
Execution status: PASS
2023-05-03 14:39:00,240 [INFO] tlt.components.docker_handler.docker_handler: Stopping container

The training logs :

2023-05-03 14:46:41,781 [INFO] root: Registry: ['nvcr.io']
2023-05-03 14:46:41,862 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
2023-05-03 14:46:42,353 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/shounak/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
Using TensorFlow backend.
2023-05-03 18:46:43.101710: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
Using TensorFlow backend.
[1683139608.764434] [75ec5ecac3ab:230  :f]        vfs_fuse.c:281  UCX  ERROR inotify_add_watch(/tmp) failed: No space left on device
2023-05-03 18:46:49,910 [INFO] root: Starting DetectNet_v2 Training job
2023-05-03 18:46:49,911 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_coco.txt.
2023-05-03 18:46:49,913 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_coco.txt
2023-05-03 18:46:49,919 [INFO] root: Training gridbox model.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2023-05-03 18:46:49,919 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2023-05-03 18:46:51,724 [INFO] root: Sampling mode of the dataloader was set to user_defined.
2023-05-03 18:46:51,807 [INFO] root: Building DetectNet V2 model
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2023-05-03 18:46:51,807 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2023-05-03 18:46:51,808 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2023-05-03 18:46:51,823 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

2023-05-03 18:46:52,630 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2023-05-03 18:46:52,783 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2023-05-03 18:46:52,783 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2023-05-03 18:46:52,783 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2023-05-03 18:46:53,126 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2023-05-03 18:47:00,738 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 3, 544, 960)  0                                            
conv1 (Conv2D)                  (None, 64, 272, 480) 9472        input_1[0][0]                    
bn_conv1 (BatchNormalization)   (None, 64, 272, 480) 256         conv1[0][0]                      
activation_1 (Activation)       (None, 64, 272, 480) 0           bn_conv1[0][0]                   
block_1a_conv_1 (Conv2D)        (None, 64, 136, 240) 36928       activation_1[0][0]               
block_1a_bn_1 (BatchNormalizati (None, 64, 136, 240) 256         block_1a_conv_1[0][0]            
block_1a_relu_1 (Activation)    (None, 64, 136, 240) 0           block_1a_bn_1[0][0]              
block_1a_conv_2 (Conv2D)        (None, 64, 136, 240) 36928       block_1a_relu_1[0][0]            
block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160        activation_1[0][0]               
block_1a_bn_2 (BatchNormalizati (None, 64, 136, 240) 256         block_1a_conv_2[0][0]            
block_1a_bn_shortcut (BatchNorm (None, 64, 136, 240) 256         block_1a_conv_shortcut[0][0]     
add_1 (Add)                     (None, 64, 136, 240) 0           block_1a_bn_2[0][0]              
block_1a_relu (Activation)      (None, 64, 136, 240) 0           add_1[0][0]                      
block_1b_conv_1 (Conv2D)        (None, 64, 136, 240) 36928       block_1a_relu[0][0]              
block_1b_bn_1 (BatchNormalizati (None, 64, 136, 240) 256         block_1b_conv_1[0][0]            
block_1b_relu_1 (Activation)    (None, 64, 136, 240) 0           block_1b_bn_1[0][0]              
block_1b_conv_2 (Conv2D)        (None, 64, 136, 240) 36928       block_1b_relu_1[0][0]            
block_1b_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160        block_1a_relu[0][0]              
block_1b_bn_2 (BatchNormalizati (None, 64, 136, 240) 256         block_1b_conv_2[0][0]            
block_1b_bn_shortcut (BatchNorm (None, 64, 136, 240) 256         block_1b_conv_shortcut[0][0]     
add_2 (Add)                     (None, 64, 136, 240) 0           block_1b_bn_2[0][0]              
block_1b_relu (Activation)      (None, 64, 136, 240) 0           add_2[0][0]                      
block_2a_conv_1 (Conv2D)        (None, 128, 68, 120) 73856       block_1b_relu[0][0]              
block_2a_bn_1 (BatchNormalizati (None, 128, 68, 120) 512         block_2a_conv_1[0][0]            
block_2a_relu_1 (Activation)    (None, 128, 68, 120) 0           block_2a_bn_1[0][0]              
block_2a_conv_2 (Conv2D)        (None, 128, 68, 120) 147584      block_2a_relu_1[0][0]            
block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320        block_1b_relu[0][0]              
block_2a_bn_2 (BatchNormalizati (None, 128, 68, 120) 512         block_2a_conv_2[0][0]            
block_2a_bn_shortcut (BatchNorm (None, 128, 68, 120) 512         block_2a_conv_shortcut[0][0]     
add_3 (Add)                     (None, 128, 68, 120) 0           block_2a_bn_2[0][0]              
block_2a_relu (Activation)      (None, 128, 68, 120) 0           add_3[0][0]                      
block_2b_conv_1 (Conv2D)        (None, 128, 68, 120) 147584      block_2a_relu[0][0]              
block_2b_bn_1 (BatchNormalizati (None, 128, 68, 120) 512         block_2b_conv_1[0][0]            
block_2b_relu_1 (Activation)    (None, 128, 68, 120) 0           block_2b_bn_1[0][0]              
block_2b_conv_2 (Conv2D)        (None, 128, 68, 120) 147584      block_2b_relu_1[0][0]            
block_2b_conv_shortcut (Conv2D) (None, 128, 68, 120) 16512       block_2a_relu[0][0]              
block_2b_bn_2 (BatchNormalizati (None, 128, 68, 120) 512         block_2b_conv_2[0][0]            
block_2b_bn_shortcut (BatchNorm (None, 128, 68, 120) 512         block_2b_conv_shortcut[0][0]     
add_4 (Add)                     (None, 128, 68, 120) 0           block_2b_bn_2[0][0]              
block_2b_relu (Activation)      (None, 128, 68, 120) 0           add_4[0][0]                      
block_3a_conv_1 (Conv2D)        (None, 256, 34, 60)  295168      block_2b_relu[0][0]              
block_3a_bn_1 (BatchNormalizati (None, 256, 34, 60)  1024        block_3a_conv_1[0][0]            
block_3a_relu_1 (Activation)    (None, 256, 34, 60)  0           block_3a_bn_1[0][0]              
block_3a_conv_2 (Conv2D)        (None, 256, 34, 60)  590080      block_3a_relu_1[0][0]            
block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60)  33024       block_2b_relu[0][0]              
block_3a_bn_2 (BatchNormalizati (None, 256, 34, 60)  1024        block_3a_conv_2[0][0]            
block_3a_bn_shortcut (BatchNorm (None, 256, 34, 60)  1024        block_3a_conv_shortcut[0][0]     
add_5 (Add)                     (None, 256, 34, 60)  0           block_3a_bn_2[0][0]              
block_3a_relu (Activation)      (None, 256, 34, 60)  0           add_5[0][0]                      
block_3b_conv_1 (Conv2D)        (None, 256, 34, 60)  590080      block_3a_relu[0][0]              
block_3b_bn_1 (BatchNormalizati (None, 256, 34, 60)  1024        block_3b_conv_1[0][0]            
block_3b_relu_1 (Activation)    (None, 256, 34, 60)  0           block_3b_bn_1[0][0]              
block_3b_conv_2 (Conv2D)        (None, 256, 34, 60)  590080      block_3b_relu_1[0][0]            
block_3b_conv_shortcut (Conv2D) (None, 256, 34, 60)  65792       block_3a_relu[0][0]              
block_3b_bn_2 (BatchNormalizati (None, 256, 34, 60)  1024        block_3b_conv_2[0][0]            
block_3b_bn_shortcut (BatchNorm (None, 256, 34, 60)  1024        block_3b_conv_shortcut[0][0]     
add_6 (Add)                     (None, 256, 34, 60)  0           block_3b_bn_2[0][0]              
block_3b_relu (Activation)      (None, 256, 34, 60)  0           add_6[0][0]                      
block_4a_conv_1 (Conv2D)        (None, 512, 34, 60)  1180160     block_3b_relu[0][0]              
block_4a_bn_1 (BatchNormalizati (None, 512, 34, 60)  2048        block_4a_conv_1[0][0]            
block_4a_relu_1 (Activation)    (None, 512, 34, 60)  0           block_4a_bn_1[0][0]              
block_4a_conv_2 (Conv2D)        (None, 512, 34, 60)  2359808     block_4a_relu_1[0][0]            
block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60)  131584      block_3b_relu[0][0]              
block_4a_bn_2 (BatchNormalizati (None, 512, 34, 60)  2048        block_4a_conv_2[0][0]            
block_4a_bn_shortcut (BatchNorm (None, 512, 34, 60)  2048        block_4a_conv_shortcut[0][0]     
add_7 (Add)                     (None, 512, 34, 60)  0           block_4a_bn_2[0][0]              
block_4a_relu (Activation)      (None, 512, 34, 60)  0           add_7[0][0]                      
block_4b_conv_1 (Conv2D)        (None, 512, 34, 60)  2359808     block_4a_relu[0][0]              
block_4b_bn_1 (BatchNormalizati (None, 512, 34, 60)  2048        block_4b_conv_1[0][0]            
block_4b_relu_1 (Activation)    (None, 512, 34, 60)  0           block_4b_bn_1[0][0]              
block_4b_conv_2 (Conv2D)        (None, 512, 34, 60)  2359808     block_4b_relu_1[0][0]            
block_4b_conv_shortcut (Conv2D) (None, 512, 34, 60)  262656      block_4a_relu[0][0]              
block_4b_bn_2 (BatchNormalizati (None, 512, 34, 60)  2048        block_4b_conv_2[0][0]            
block_4b_bn_shortcut (BatchNorm (None, 512, 34, 60)  2048        block_4b_conv_shortcut[0][0]     
add_8 (Add)                     (None, 512, 34, 60)  0           block_4b_bn_2[0][0]              
block_4b_relu (Activation)      (None, 512, 34, 60)  0           add_8[0][0]                      
output_bbox (Conv2D)            (None, 20, 34, 60)   10260       block_4b_relu[0][0]              
output_cov (Conv2D)             (None, 5, 34, 60)    2565        block_4b_relu[0][0]              
Total params: 11,561,113
Trainable params: 11,383,961
Non-trainable params: 177,152
2023-05-03 18:47:00,760 [INFO] root: DetectNet V2 model built.
2023-05-03 18:47:00,761 [INFO] root: Building rasterizer.
2023-05-03 18:47:00,761 [INFO] root: Rasterizers built.
2023-05-03 18:47:00,773 [INFO] root: Building training graph.
2023-05-03 18:47:00,774 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2023-05-03 18:47:00,774 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2023-05-03 18:47:00,774 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2023-05-03 18:47:00,774 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 48, io threads: 96, compute threads: 48, buffered batches: 4
2023-05-03 18:47:00,774 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 5764, number of sources: 1, batch size per gpu: 2, steps: 2882

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

2023-05-03 18:47:00,806 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7ff5276b5278>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7ff5276b5278>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-05-03 18:47:00,838 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7ff5276b5278>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7ff5276b5278>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-05-03 18:47:00,852 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2023-05-03 18:47:01,017 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2023-05-03 18:47:01,022 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2023-05-03 18:47:01,022 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7ff44c7ddfd0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7ff44c7ddfd0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-05-03 18:47:01,032 [WARNING] tensorflow: Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7ff44c7ddfd0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7ff44c7ddfd0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-05-03 18:47:01,277 [INFO] __main__: Found 5764 samples in training set
2023-05-03 18:47:01,281 [INFO] root: Rasterizing tensors.
2023-05-03 18:47:01,476 [INFO] root: Tensors rasterized.
2023-05-03 18:47:03,772 [INFO] root: Training graph built.
2023-05-03 18:47:03,772 [INFO] root: Building validation graph.
2023-05-03 18:47:03,773 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2023-05-03 18:47:03,773 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2023-05-03 18:47:03,773 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2023-05-03 18:47:03,773 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 48, io threads: 96, compute threads: 48, buffered batches: 4
2023-05-03 18:47:03,773 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 1982, number of sources: 1, batch size per gpu: 2, steps: 991
WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7ff527698940>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7ff527698940>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-05-03 18:47:03,781 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7ff527698940>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7ff527698940>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-05-03 18:47:03,794 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2023-05-03 18:47:03,946 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2023-05-03 18:47:03,949 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2023-05-03 18:47:03,949 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7ff3cc611fd0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7ff3cc611fd0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-05-03 18:47:03,960 [WARNING] tensorflow: Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7ff3cc611fd0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7ff3cc611fd0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code

2023-05-03 18:47:04,112 [INFO] __main__: Found 1982 samples in validation set
2023-05-03 18:47:04,112 [INFO] root: Rasterizing tensors.
2023-05-03 18:47:04,269 [INFO] root: Tensors rasterized.
2023-05-03 18:47:04,550 [INFO] root: Validation graph built.
2023-05-03 18:47:06,102 [INFO] root: Running training loop.
2023-05-03 18:47:06,102 [INFO] __main__: Checkpoint interval: 10
2023-05-03 18:47:06,103 [INFO] __main__: Scalars logged at every 288 steps
2023-05-03 18:47:06,103 [INFO] __main__: Images logged at every 2882 steps
INFO:tensorflow:Create CheckpointSaverHook.
2023-05-03 18:47:06,105 [INFO] tensorflow: Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2023-05-03 18:47:08,009 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2023-05-03 18:47:10,066 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2023-05-03 18:47:10,653 [INFO] tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2023-05-03 18:47:17,599 [INFO] tensorflow: Saving checkpoints for step-0.
INFO:tensorflow:epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.098450065, step = 0
2023-05-03 18:48:03,569 [INFO] tensorflow: epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.098450065, step = 0
2023-05-03 18:48:03,576 [INFO] root: None
2023-05-03 18:48:03,594 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 0/120: loss: 0.09845 learning rate: 0.00000 Time taken: 0:00:00 ETA: 0:00:00
2023-05-03 18:48:03,594 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 0.071
2023-05-03 18:48:06,263 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 1.623
2023-05-03 18:48:07,220 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.261
2023-05-03 18:48:08,186 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.776
INFO:tensorflow:epoch = 0.029493407356002775, learning_rate = 5.0569115e-06, loss = 0.06569926, step = 85 (5.038 sec)
2023-05-03 18:48:08,608 [INFO] tensorflow: epoch = 0.029493407356002775, learning_rate = 5.0569115e-06, loss = 0.06569926, step = 85 (5.038 sec)
2023-05-03 18:48:09,150 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.913
2023-05-03 18:48:10,111 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.055
2023-05-03 18:48:11,071 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.052
2023-05-03 18:48:12,027 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.321
2023-05-03 18:48:12,978 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.576
INFO:tensorflow:epoch = 0.07529493407356003, learning_rate = 5.1465836e-06, loss = 0.03726593, step = 217 (5.064 sec)
2023-05-03 18:48:13,671 [INFO] tensorflow: epoch = 0.07529493407356003, learning_rate = 5.1465836e-06, loss = 0.03726593, step = 217 (5.064 sec)
2023-05-03 18:48:13,939 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.071
2023-05-03 18:48:14,895 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.320
2023-05-03 18:48:15,855 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.092
INFO:tensorflow:global_step/sec: 19.788
2023-05-03 18:48:18,125 [INFO] tensorflow: global_step/sec: 19.788
2023-05-03 18:48:18,553 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 18.532
INFO:tensorflow:epoch = 0.10548230395558639, learning_rate = 5.2065548e-06, loss = 0.022609862, step = 304 (5.076 sec)
2023-05-03 18:48:18,747 [INFO] tensorflow: epoch = 0.10548230395558639, learning_rate = 5.2065548e-06, loss = 0.022609862, step = 304 (5.076 sec)
2023-05-03 18:48:19,519 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.770
2023-05-03 18:48:20,480 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.070
2023-05-03 18:48:21,435 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.338
2023-05-03 18:48:22,398 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.962
2023-05-03 18:48:23,360 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.013
INFO:tensorflow:epoch = 0.15128383067314363, learning_rate = 5.2988803e-06, loss = 0.014012933, step = 436 (5.071 sec)
2023-05-03 18:48:23,818 [INFO] tensorflow: epoch = 0.15128383067314363, learning_rate = 5.2988803e-06, loss = 0.014012933, step = 436 (5.071 sec)
2023-05-03 18:48:24,323 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.922
2023-05-03 18:48:25,289 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.750
2023-05-03 18:48:26,251 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.009
2023-05-03 18:48:27,207 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.325
2023-05-03 18:48:28,167 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.106
INFO:tensorflow:epoch = 0.1967383761276891, learning_rate = 5.3921226e-06, loss = 0.00902343, step = 567 (5.042 sec)
2023-05-03 18:48:28,861 [INFO] tensorflow: epoch = 0.1967383761276891, learning_rate = 5.3921226e-06, loss = 0.00902343, step = 567 (5.042 sec)
2023-05-03 18:48:29,131 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.869
INFO:tensorflow:global_step/sec: 25.9738
2023-05-03 18:48:29,213 [INFO] tensorflow: global_step/sec: 25.9738
2023-05-03 18:48:30,096 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.844
2023-05-03 18:48:31,065 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.619
2023-05-03 18:48:32,028 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 51.934
2023-05-03 18:48:32,988 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 52.087
INFO:tensorflow:epoch = 0.24219292158223454, learning_rate = 5.4870065e-06, loss = 0.007546805, step = 698 (5.047 sec)

I have also added super categories in the json.

Could you please upload the full training log? Please use button
to upload. Thanks.

This is the training log, running it for 12 epochs, the model has started to learn something but not much. Especially because of the results it has on person class. I can also see that the mAP is more or less inversely proportional to the number of labels.

tao-logs.txt (293.8 KB)

So, when you run 12 epochs, the result is

Epoch 12/12

Validation cost: 0.001081
Mean average_precision (in %): 6.0209

class name      average precision (in %)
------------  --------------------------
check                         17.5964
menu                           6.87127
person                         0.136122
plate                          5.48718
table                          0.0137174

How about the resolution of the 5 kinds of objects, are they small? Could you share an example of training image?

Please take a look at FAQ (Frequently Asked Questions - NVIDIA Docs).

In DetectNet_V2, are there any parameters that can help improve AP (average precision) on training small objects?

Following parameters can help you improve AP on smaller objects:

  • Increase num_layers of resnet
  • class_weight for small objects
  • Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class
  • Decrease minimum_detection_ground_truth_overlap
  • Lower minimum_height to cover more small objects for evaluation.

Distribute the dataset class: How do I balance the weight between classes if the dataset has significantly higher samples for one class versus another?

To account for imbalance, increase the class_weight for classes with fewer samples. You can also try disabling enable_autoweighting; in this case initial_weight is used to control cov/regression weighting. It is important to keep the number of samples of different classes balanced, which helps improve mAP.

More, you can use deeper backbone. And also you can use yolov4_tiny network instead.

