Is it possible to adjust class_weight in YOLOv4 like DetectNet v2?

Please provide the following information when requesting support.

• Hardware (V100)
• Network Type (Detectnet_v2/Yolo_v4/)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)

Configuration of the TLT Instance
dockers: ['nvidia/tlt-streamanalytics', 'nvidia/tlt-pytorch']
format_version: 1.0
tlt_version: 3.0
published_date: 04/16/2021
docker_tag: v3.0-py3

• Training spec file(If have, please share here)

random_seed: 42
yolov4_config {
  big_anchor_shape: "[(142.00, 468.00), (277.00, 335.00), (515.00, 234.00), (432.00, 540.00), (951.00, 731.00)]"
  mid_anchor_shape: "[(85.00, 161.00), (112.00, 124.00), (133.00, 181.00), (192.00, 135.00), (196.00, 227.00)]"
  small_anchor_shape: "[(42.00, 48.00), (51.00, 65.00), (68.00, 77.00), (76.00, 107.00), (100.00, 86.00)]"
  box_matching_iou: 0.3
  arch: "resnet"
  nlayers: 50
  arch_conv_blocks: 2
  loss_loc_weight: 5.0
  loss_neg_obj_weights: 50.0
  loss_class_weights: 1.0
  label_smoothing: 0.1
  big_grid_xy_extend: 0.05
  mid_grid_xy_extend: 0.1
  small_grid_xy_extend: 0.2
  freeze_bn: false
  #freeze_blocks: 0
  force_relu: false
}
training_config {
  batch_size_per_gpu: 2
  num_epochs: 10
  enable_qat: true
  checkpoint_interval: 1
  learning_rate {
    soft_start_cosine_annealing_schedule {
      min_learning_rate: 1e-7
      max_learning_rate: 1e-4
      soft_start: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 3e-5
  }
  optimizer {
    adam {
      epsilon: 1e-7
      beta1: 0.9
      beta2: 0.999
      amsgrad: false
    }
  }
  pretrain_model_path: "/workspace/tlt-experiments/yolo_v4/pretrained_resnet50/tlt_pretrained_object_detection_vresnet50/resnet_50.hdf5"
}
eval_config {
  average_precision_mode: SAMPLE
  batch_size: 2
  matching_iou_threshold: 0.4
}
nms_config {
  confidence_threshold: 0.001
  clustering_iou_threshold: 0.4
  top_k: 100
}
augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure:1.5
  vertical_flip:0
  horizontal_flip: 0.5
  jitter: 0.3
  output_width: 416
  output_height: 416
  output_channel: 3
  randomize_input_shape_period: 0
  mosaic_prob: 0.5
  mosaic_min_ratio:0.2
}
dataset_config {
  data_sources: {
      label_directory_path: "/workspace/tlt-experiments/data/training/label_2"
      image_directory_path: "/workspace/tlt-experiments/data/training/image_2"
  }
  include_difficult_in_training: true
  target_class_mapping {
      key: "안전벨트 착용"
      value: "Belt on"
  }
  target_class_mapping {
      key: "안전벨트 미착용"
      value: "Belt off"
  }
  target_class_mapping {
      key: "안전고리 결착"
      value: "Hook on"
  }
  target_class_mapping {
      key: "안전고리 미결착"
      value: "Hook off"
  }
  target_class_mapping {
      key: "안전화 착용"
      value: "Shoes on"
  }
  target_class_mapping {
      key: "안전화 미착용" 
      value: "Shoes off"
  }
  target_class_mapping {
      key: "안전모 착용"
      value: "Helmet on"
  }
  target_class_mapping {
      key: "안전모 미착용"
      value: "Helmet off"
  }
  target_class_mapping {
      key: "포크레인"
      value: "Fork lane"
  }
  target_class_mapping {
      key: "페이로다"
      value: "Payloader"
  }
  target_class_mapping {
      key: "지게차"
      value: "Forklift"
  }
  target_class_mapping {
      key: "덤프트럭"
      value: "Dump truck"
  }
  target_class_mapping {
      key: "레미콘"
      value: "Remicon"
  }
  target_class_mapping {
      key: "펌프카"
      value: "Pump car"
  }
  target_class_mapping {
      key: "항타기"
      value: "Pile driver"
  }
  target_class_mapping {
      key: "트럭"
      value: "Truck"
  }
  target_class_mapping {
      key: "고소작업대"
      value: "Aerial workbench"
  }
  target_class_mapping {
      key:  "타워크레인"
      value:  "Tower crane"
  }
  target_class_mapping {
      key: "스카이"
      value: "Aerial work platform car"
  }
  target_class_mapping {
      key: "갱폼"
      value: "Gang form"
  }
  target_class_mapping {
      key: "알폼"
      value: "Al form"
  }
  target_class_mapping {
      key: "A형 사다리"
      value: "A-type ladder"
  }
  target_class_mapping {
      key: "우마"
      value: "Uma"
  }
  target_class_mapping {
      key: "분전반"
      value: "ELB"
  }
  target_class_mapping {
      key: "개구부 덮개"
      value: "Opening cover"
  }
  target_class_mapping {
      key: "위험물 보관소"
      value: "Dangerous goods storage"
  }
  target_class_mapping {
      key: "ELEV 추락방지막"
      value: "Elevator fall arrester"
  }
  target_class_mapping {
      key: "호이스트"
      value: "Hoist"
  }
  target_class_mapping {
      key: "잭서포트"
      value: "Jack support"
  }
  target_class_mapping {
      key: "강관비계"
      value: "Steal pipe scaffolding"
  }
  target_class_mapping {
      key: "시스템비계"
      value: "System scaffolding"
  }
  target_class_mapping {
      key: "시멘트벽돌"
      value: "Cement brick"
  }
  target_class_mapping {
      key: "망치"
      value: "Hammer"
  }
  target_class_mapping {
      key: "전동드릴"
      value: "Electric drill"
  }
  target_class_mapping {
      key: "레미탈"
      value: "Remital"
  }
  target_class_mapping {
      key: "치장블럭"
      value: "Stucco block"
  }
  target_class_mapping {
      key: "믹서기"
      value: "Mixer"
  }
  target_class_mapping {
      key: "H빔"
      value: "H beam"
  }
  target_class_mapping {
      key: "고속절단기"
      value: "High speed cutting machine"
  }
  target_class_mapping {
      key: "바이브레이터"
      value: "Vibrator"
  }
  target_class_mapping {
      key: "소화기"
      value: "Fire extinguisher"
  }
  target_class_mapping {
      key: "용접기"
      value: "Welding machine"
  }
  target_class_mapping {
      key: "핸드그라인더"
      value: "Hand grinder"
  }
  target_class_mapping {
      key: "핸드카"
      value: "Hand car"
  }
  target_class_mapping {
      key: "불티방지막"
      value: "Anti-burn"
  }
  validation_data_sources: {
      label_directory_path: "/workspace/tlt-experiments/data/val/label"
      image_directory_path: "/workspace/tlt-experiments/data/val/image"
  }
}

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hello.
I am a user who is very interested in TLT.
I trained for 100 epochs using YOLOv4_resnet18 on a dataset with 45 classes.
The data set is about 28 GB in size and contains 50,000 image and label files. (data can be added)
And we would like to ask you the following questions:

  1. For most small objects, mAP is learned as 0. So the final mAP only goes up to 0.22. (YOLOv4_resnet18, 100 epoch)
    → So I wanted to use resnet101, but according to the document, is it correct that YOLOv4 does not provide a resnet101 pre-training model? When I changed the configuration file to fit resnet101 and ran it, I got an error saying that the layer could not be found. So, I am currently training with resnet50 (5epoch, mAP:0.01 now).

  1. The dataset has a serious imbalance of classes. Some classes have a maximum of 32,000, and some have a maximum of 62.

    → So I want to utilize the class_weight option described in DetectNet v2. Is there any example or way to apply this in YOLOv4? By what criteria should I set the value of class_weight?
    → If the class_weight option is not available in YOLOv4, does DetectNet v2 have sufficient performance in object detection? Would it be better to change the model and train it?
  • If there is a part that needs additional explanation, I will reply after checking it. Thank you for always. Thanks for supporting a great toolkit. I would appreciate it if you could understand even if there are strange sentences using a translator.

Yolo_v4 can support resnet101 as the backbone. See Overview — TAO Toolkit 3.22.05 documentation, NVIDIA TAO Documentation, https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_object_detection.
Please download the pretrained weights. It is trained on a subset of the Google OpenImages dataset.

In Yolo_v4, there is not class_weight option similar to Detectnet_v2.
For such serious imbalance of classes, detectnet_v2 cannot handle it better.

More, the pretrained model plays an important role. We do not provide pretrained weights from Imagenet dataset. If you have bandwidth, please train a pretrained model with Imagenet dataset. Then use it as a pretrained model to train Yolo_v4 network. More info, see https://developer.nvidia.com/blog/preparing-state-of-the-art-models-for-classification-and-object-detection-with-tlt/

I imported a pretrained model from NGC and used it.

# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tlt_pretrained_object_detection:resnet18 \
                    --dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet18

Are you saying that this model is not supported by TLT?

In TLT, do not import the pretrained model using NGC, but directly pretrain ImageNet and then use it as the pretrained model?

I’m worried about the time it takes to train the pretrained model myself because I can only use one GPU.

And thanks for the reply, I’ll try DetectNet.

No, Yolo_v4 can support resnet101 as the backbone. Please download the resnet101 pretrained weights from https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_object_detection/files?version=resnet101 . This pretrained weights was trained on a subset of the Google OpenImages dataset. You can use it to continue the yolo_v4 training.

In the long term, I mean, you can train a pretrained weight with Imagenet dataset.

Any plans to support oversampling or undersampling in Nvidia?

I will sync with internal team for your request.

For your case, there is also a potential way. Train a yolo_v4 model for the top16 majority classes. And also train a yolo_v4 model for the other 29 classes.
After training done, run inference with these two engines.