Yolo v4 Giving 0.0 AP for less images class

I trained a YOLOv4 model for 100 epochs. I am training it on 7,000 images across 10 classes. However, for 7 of the classes, we have very few images compared to the other 3 classes. We are getting an AP of 0 for these 7 classes, while for the other 3 classes, we are achieving an AP above 0.80. I also tried using class weight configuration, which didn’t help. I am using default hyper parameters, only changing the learning rate. How can I improve the AP for the classes with fewer images? I did the same training with YOLOv3, which gave us good results for the classes with fewer images as well

For the 7 classes, suggest you to copy the images several times to enlarge the dataset for each class. For example, if you have only 100 images for each class, you can copy 9 times and then will get totally 1000 images for each class.
More, please make sure use kmeans to generate correct anchor shapes against the new dataset.

Okay, if we copy those 100 images from the lower-class to create 1000 copies, the resulting images will also contain annotations from higher-class objects. For a single image with one instance of a smaller class annotation, there will be 10 annotations for the higher-class object . So, when we duplicate the images multiple times, it will increase the annotation count for the smaller classes and, correspondingly, increase the annotation count for the higher classes by a factor of 10.
So the ratio for smaller and higher classes remains the same only.

i am using kmeans to generate correct anchor shapes.

Then, you can use “Class Weighting Config” method mentioned in YOLOv4 document. Refer to YOLOv4 - NVIDIA Docs .

I’ve already tried class weight configuration parameters, but I’m still not getting good results. I also tried enabling ‘enable_auto.’ I am sharing the class and annotation list. Can you suggest appropriate class weights now?
I need Good AP for class a , b , c , g , i.

Annotation counts per class:
class_a: 5544
class_b: 9761
class_c: 4369
class_d: 116
class_e: 76
class_f: 40
class_g: 30031
class_h: 362
class_i: 1006
class_j: 187
class_k: 282"

Usually you can set class weights based on the images counts.
For example,
if

class_a: 5544
class_b: 9761
class_c: 4369
class_d: 116

then

  class_weighting{
      key: "a"
      value: 1.8
  }
  class_weighting{
      key: "b"
      value: 1.0
  }
     class_weighting{
      key: "c"
      value: 2.2
  }
  class_weighting{
      key: "d"
      value: 84.1
  }

Can you share the latest spec file?
Also, to narrow down, you can also run training which only runs on class a , b , c , g and i.

I have attached a config(exp1 = noclass_weight & exp2 = classweight
config_v1_exp_clss_weight.txt (3.1 KB)
config_v1_exp_noclss_weight.txt (2.5 KB)
) file for your review. please suggest if any changes i need to do.

I trained YOLOv4 using different model training experiments with both default and modified parameters, as well as class weights. However, I didn’t achieve satisfactory results on Tao Toolkit 4.0.1. Interestingly, I obtained good results for YOLOv3 on Tao Toolkit 3.1.24.

I suspect the issue may be related to the TFRecords. When I use TFRecords generated in version 4.0.1 for YOLOv3 and YOLOv4 model training, I observe poor results. However, when I generate TFRecords using version 3.1.24 for both YOLOv3 and YOLOv4 and then train the models in both 3.1.24 and 4.0.1, I achieve good results for YOLOv3 and YOLOv4.

Which docker did you use for “Tao Toolkit 3.1.24”?

we are using ac77f8d117ed docker image for Tao Toolkit 3.1.24

Refer to Very high loss while training TAO yolov4 - #4 by adithya.ajith, actually the spec files should not be the same for 4.0.1 docker and 3.22.05 docker.

Could you share the full training log when you mentioned “use TFRecords generated in version 4.0.1 for YOLOv4 model training, I observe poor results” ?
Also, could you share an example for your label file?

More, please try to run YOLOv4 training again in 4.0.1 docker, while setting
include_difficult_in_training: false.

1 Like

Very high loss while training TAO yolov4 - #4 by adithya.ajith ", he is my teammate , we are working on the same dataset. Initially, we were experiencing high loss when we used label smoothing at 0.1 , then i try running training with 0.0 label smoothing which reduce loss value its what we observe.

We had already achieved excellent results with YOLOv3 on version 3.1.24 of the Tao Toolkit. then , we decided to shift to YOLOv4 on version 3.1.24. However, after a suggestion from a moderator, we decided to transition to version 4.0.1 and made changes to the spec file accordingly.

Unfortunately, we did not achieve the desired results for YOLOv4 on version 4.0.1. To test , we also tested YOLOv3 on 4.0.1 to see if we could achieve good results. Surprisingly, we observed poor results for YOLOv3 as well.
then i generated TFrecord in 3.1.24 and train model and we got Good results for both yolo v3 and yolov4.

TFRecords generated in version 4.0.1 for YOLOv4 model training logs-
model_output_labels.txt (105 Bytes)
yolov4_training_log_resnet18.csv (5.8 KB)

same dataset and Latest model training log on 3.1.24-
yolov4_training_log_resnet18.csv (5.8 KB)

Could you double check the training logs? Seems that both are the same.
More, could you share one example for the label file in your dataset?
For example,
car xx xx xx xx xx …

TFRecords generated in version 4.0.1 for YOLOv4 model training logs-

yolov4_training_log_resnet18.csv (5.8 KB)

same dataset and Latest model training log on 3.1.24-

yolov4_training_log_resnet18_sept11.csv (7.3 KB)

label file -
model_output_labels.txt (105 Bytes)

Sorry, could you check the label txt file?
For example,
car 0 0 0 100 200 300 400 0.0 0.0 0.0 0.0 0.0 0.0

kitti label file-
7-sj_9630.txt (2.1 KB)

According to above comment, when you use the tfreocrds files which are generated in 3.1.24, then train the models in 3.1.24 docker or 4.0.1 docker, the results are good. Correct?

i generated tfreocrds files on 3.1.24 and then i train it on 3.1.24 then we are getting good result.
i need to test if we generate tfrecord on 3.1.24 and then train it on 4.0.1 how result wil look.

1 Like

i generate tfrecord on 3.1.24 and then train it on 4.0.1.
i am getting good results.

OK, thanks for the info. So, there is no issue when using the tfrecord files generated from 3.1.24 docker. Suggest you to double check the dataset, command line, spec file and log when generate tfrecord files in 4.0.1 docker.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.