mAP and every AP not improving while training TLT YOLO_V4 with custom data

• Hardware Platform (Jetson / GPU)
nvidia GPU
• DeepStream Version
5.0.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
7.0.0
• NVIDIA GPU Driver Version (valid for GPU only)
460.39
• Issue Type( questions, new requirements, bugs)
question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
follow the steps I describe
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello everyone.
I´m trying to generate a Yolo_v4 model for object detection.
For this model I’m using a custom 500-image dataset, where

  • every image is 1260x700
  • augmentation output is 1280x704 (both are multiples of 32)

The labels I want to get are the following.

  • truck-tanker
  • truck-tank
  • truck-front
  • person

and are being mapped from these classes.

  target_class_mapping {
      key: "pedestrian"
      value: "person"
  }
  target_class_mapping {
      key: "truck"
      value: "truck-tanker"
  }
  target_class_mapping {
      key: "bus"
      value: "truck-tank"
  }
  target_class_mapping {
      key: "car"
      value: "truck-front"
  }
  target_class_mapping {
      key: "person_sitting"
      value: "person"
  }

The problem occurs at the training.
I can see the loss getting minimized quickly every epoch, but whenever the training hits a checkpoint and evaluates, I get 0 AP for every label.

What could I be doing wrong?

this is my training specs file. Since results are not improving as expected, I have not pruned nor retrained.

random_seed: 42
yolov4_config {
  big_anchor_shape: "[(114.94, 60.67), (159.06, 114.59), (297.59, 176.38)]"
  mid_anchor_shape: "[(42.99, 31.91), (79.57, 31.75), (56.80, 56.93)]"
  small_anchor_shape: "[(15.60, 13.88), (30.25, 20.25), (20.67, 49.63)]"
  box_matching_iou: 0.25
  arch: "resnet"
  nlayers: 18
  arch_conv_blocks: 2
  loss_loc_weight: 0.8
  loss_neg_obj_weights: 100.0
  loss_class_weights: 0.5
  label_smoothing: 0.0
  big_grid_xy_extend: 0.05
  mid_grid_xy_extend: 0.1
  small_grid_xy_extend: 0.2
  freeze_bn: false
  #freeze_blocks: 0
  force_relu: false
}
training_config {
  #batch_size_per_gpu: 8
  batch_size_per_gpu: 4
  num_epochs: 80
  enable_qat: false
  checkpoint_interval: 10
  learning_rate {
    soft_start_cosine_annealing_schedule {
      min_learning_rate: 1e-7
      max_learning_rate: 1e-4
      soft_start: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 3e-5
  }
  optimizer {
    adam {
      epsilon: 1e-7
      beta1: 0.9
      beta2: 0.999
      amsgrad: false
    }
  }
  pretrain_model_path: "/workspace/tlt-experiments/yolo-pfuenzalida/yolo_v4/pretrained_resnet18/tlt_pretrained_object_detection_vresnet18/resnet_18.hdf5"
}
eval_config {
  average_precision_mode: SAMPLE
  #batch_size: 8
  batch_size: 4
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.001
  clustering_iou_threshold: 0.5
  top_k: 200
}
augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure:1.5
  vertical_flip:0
  horizontal_flip: 0.5
  jitter: 0.3
  output_width: 1280
  output_height: 704
  randomize_input_shape_period: 0
  mosaic_prob: 0.5
  mosaic_min_ratio:0.2
}
dataset_config {
  data_sources: {
      label_directory_path: "/workspace/tlt-experiments/data/training/label_2"
      image_directory_path: "/workspace/tlt-experiments/data/training/image_2"
  }
  include_difficult_in_training: true
  target_class_mapping {
      key: "pedestrian"
      value: "person"
  }
  target_class_mapping {
      key: "truck"
      value: "truck-tanker"
  }
  target_class_mapping {
      key: "bus"
      value: "truck-tank"
  }
  target_class_mapping {
      key: "car"
      value: "truck-front"
  }
  target_class_mapping {
      key: "person_sitting"
      value: "person"
  }
  validation_data_sources: {
      label_directory_path: "/workspace/tlt-experiments/data/val/label"
      image_directory_path: "/workspace/tlt-experiments/data/val/image"
  }
}

PS: I originaly wanted to derivate thuck-tank, truck-tanker, and truck-front directly from truck, but tlt ignored the two firsts and papped only the last written. Is there a way for deriving multiple specific labels from the same origin?

thank you.

Files are:

Please note that in target_class_mapping , all the key or value cannot be a dummy class name. It must exist in your label files.
What are the actual class names in all of your label files? Are they “pedestrian”, “truck”, “bus”, “car” and “person_sitting” ?

Hello @Morganh, thank you for your time.
No, the classes are the ones mentioned above

  • truck-tanker
  • truck-tank
  • truck-front
  • person

I used the “dummy” values because those are the ones used on the examples.
would you be so kind to tell me where can I find documentation about target_class_map please?
I read these links, but I still have some doubts after reading.
https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/object_detection/index.html
https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/object_detection/yolo_v4.html#creating-a-configuration-file

thank you again for your time
the correct way would be something like this?
target_class_mapping {
key: “person”
value: “person”
}
target_class_mapping {
key: “truck-tanker”
value: “truck-tanker”
}
target_class_mapping {
key: “truck-tank”
value: “truck-tank”
}
target_class_mapping {
key: “truck-front”
value: “truck-front”
}
target_class_mapping {
key: “person-helmet”
value: “person-helmet”
}

Yes, for your case, below is correct.
target_class_mapping {
key: “person”
value: “person”
}
target_class_mapping {
key: “truck-tanker”
value: “truck-tanker”
}
target_class_mapping {
key: “truck-tank”
value: “truck-tank”
}
target_class_mapping {
key: “truck-front”
value: “truck-front”
}

But how about “person-helmet”, is it a class name inside your label file too?

For document, please see DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation

target_class_mapping : This parameter maps the class names in the tfrecords to the target class to be trained in the network. An element is defined for every source class to target class mapping. This field was included with the intention of grouping similar class objects under one umbrella. For example: car, van, heavy_truck etc may be grouped under automobile. The “key” field is the value of the class name in the tfrecords file and the “value” field corresponds to the value that the network is expected to learn.

1 Like

yes, person-helmet is a class, but is a low priority because I have not that much pictures for that class, that’s why I did not included it.
Thank you for your help!

OK, so, please modify your training spec file.