Too many false positives.

Yes

I am using resnet10 model.

Also, false positives are specifically huge for person class. For other classes, the false positives are much less.

Can you check how many images for each class?

So, this is the number of instances per class which I get while creating tfrecords -

Wrote the following numbers of objects:
person: 385446
bicycle: 10484
motorbike: 12940
bus: 23822
car: 602408

Currently, I am just concerned with getting good detections on car and person, hence the other class instances are quite low.

This is the class weight parameters with which I have tried tlt training with no significant success.

car = 1.0, person = 1.0
car = 1.0, person = 2.0
car = 2.0, person = 1.0

Could you plesae paste your mAP and each class’ AP result during the training?

More suggestions to tune the parameters:

  1. Set all minimum_detection_ground_truth_overlap to 0.5
  2. Set below to the same
objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  1. Set minimum_bounding_box_height to 3
  2. Set bs=16, epoch=360
  3. Set class_weight, try below:
    car: bus: bicycle: motorbike: person = 1: 16: 40: 30: 1.6
  4. Since your dataset is too unblanced, please consider train 3 classes or 2 classes.
    2 classes: car and person
  5. You can try to make dataset more blanced.
    Reduce some images of “car” class.
  6. Change to use resnet18 backbone

Hi,
Thanks for the reply.

I am getting the following AP and mAP.

Mean average_precision (in %): 22.5203

class name average precision (in %)


bicycle 1.00966
bus 1.86996
car 78.0495
motorbike 9.60343
person 22.0692

  1. Will do that, but I fail to see how it will affect the training.
  2. For all the classes should I set this ?
  3. Currently I am using batch size as 4. From what I understand, the lesser the batch size, generally greater the accuracy. Changing to 16 might speed up the training but what about the accuracy ?
  4. Also, I have already trained and retrained this model in sets of 120, 40, 40 , 35 epochs ( keeping the previous trained tlt model as initial weights) for a total of 235 epochs.
  5. Currently I am primarily concerned with person and car detections and reducing the false positives, from what I have seen increasing the weights led to more false positive.
  6. I don’t wish to change to resnet18 backbone as I have to deploy the final model on jetson nano and my fps will drop.

Also, while training I get these messages before the training starts.

target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.

But the training seems to work fine.

Do these messages mean something?

Those messages are not harmful.

  1. Will do that, but I fail to see how it will affect the training.
    [Morgan] Set lower IoU threshold

  2. For all the classes should I set this ?
    [Morgan] Yes

  3. Currently I am using batch size as 4. From what I understand, the lesser the batch size, generally greater the accuracy. Changing to 16 might speed up the training but what about the accuracy ?
    [Morgan] How many GPUs are your using?

  4. Also, I have already trained and retrained this model in sets of 120, 40, 40 , 35 epochs ( keeping the previous trained tlt model as initial weights) for a total of 235 epochs.
    [Morgan] Please do not use the previous trained tlt model as initial weights, because you get a low mAP previously.
    From my experiences, inlarging the epochs can have a positive effect on mAP.

  5. Currently I am primarily concerned with person and car detections and reducing the false positives, from what I have seen increasing the weights led to more false positive.
    [Morgan]: As mentioned above, since your dataset is too unblanced, please consider train 3 classes or 2 classes.
    (car and person). Or try to make dataset more blanced. Reduce some images of “car” class.
    Also, tune the class_weight.

  6. I don’t wish to change to resnet18 backbone as I have to deploy the final model on jetson nano and my fps will drop.
    [Morgan] For fps, you can tune different pruning ratio to improve. For mAP, resnet50 and resnet18 will be better than resnet10.

  1. I am using single gpu for training.
  2. I still wish to go for 5 class training, although I will reduce the number of car instances.
  3. I had trained a resnet18 model for 140 epochs previously, with more or less similar dataset although in that case the number of car instances was comparable to that of person. I was still getting false positives on persons ( although much less ). The mAP was greater than this(as expected).
    Lastly, I will be integrating the model with deepstream and the default deepstream model is based on resnet10, what kind of performance(fps) drop should I expect if I go ahead with resnet18.

For mAP, you can try larger backbones to meet requirement. For fps, you can prune to different tlt models and then retrain, try more experiments and select one to meet the requirement.
For tweaking class_weight,related topic: https://devtalk.nvidia.com/default/topic/1069397/transfer-learning-toolkit/detectnet-v2-18-layers-for-character-recognition-35-classes-/post/5419070/#5419070

Hi Morganh,
Thanks for the reply.
Are there any other pointers which you can give ?

Please try more experiments as mentioned above. If possible, please make each class’ data more balanced.
And use parts of the tfrecords to train for the first time to speep up training/tuning speed.

Hi Morganh,
I made the required changes i.e. I reduced the number of cars and made it to 250k instances with person being 300k instances. I have kept the other class instances same for the time being.
I retrained resent10 model for 180 epochs continuously. I am still seeing false positives for person. The issue is the false positives are quite big in size and have high confidence. I also trained a resnet18 model. As expected, the false positives reduced ( but are still there ).

So,
Can you provide me with more suggestions ?

Also,
Even though the other classes’ map are low ( motorbike, bicycle) and their instances are also low, while inferencing I am seeing qood results. So my questions is, whether having low data for other classes’ affects the detections of person class. Also, I have adjusted the class cost weight according to the number of instances.

Hi neophyte1 ,
Could you paste your latest mAP result for running resnet10 or resnet18?

And do you still use the COCO dataset?
After some experiments, for COCO dataset, I find that TLT SSD network can have a much better mAP result than detectnet_v2 network.

Yeah I am primarily using COCO, though I have added some custom augmentation on top of it and some other scraped images to increase the number of cars and make the person/class ratio as equal as possible.

The mAP of resnet 18 is this.

Mean average_precision (in %): 29.2839

class name      average precision (in %)
------------  --------------------------
bicycle                         17.7977
bus                              3.56778
car                             77.8739
motorbike                       28.874
person                          48.3061

For resnet 10 its this -

Mean average_precision (in %): 22.2205

class name      average precision (in %)
------------  --------------------------
bicycle                         0.760452
bus                             6.07634
car                            67.7383
motorbike                      13.8474
person                         22.6802

The dataset used in both the cases are not exactly the same, but more or less they are similar.

As I will be deploying this with deepstream on nano, speed and fps is quite important to me and that’s why I wish to stick to resnet10 detectnetv2 as the deepstream reference app comes with the same architecture.

I am not sure about the fps with SSD.

For detectnet_v2, could you please attach latest training spec for resnet10 or resnet18?
I suggest you try to run (bs16 and 180 epoch) or (bs4 and 180 epoch) separately to see what is happened.
I’m afraid your training loss is still not the lowest.

More ideas are as below.
1.If you change class cost weight, I suggest you set the “auto_weighting” to False.

  1. For Minimum_bounding_box_height and minimum_height and minimum_width,I explain as below.
    Minimum_bounding_box_height:
    Used in evaluation.If detection bbox height is smaller than this value, the detection is dropped.

Minimum_height and minimum_width:
If detection are still alive after minimin_bounding_box_height, if detection width or detection height is smaller than these values, then the detection is dropped.

So there is 2 possibilities if you increase minimum_height andminimun_width:

  1. A bbox is a FP but smaller than the 2 values, then this FP is deleted, so this improves mAP.
  2. A bbox is a TP but smaller than the 2 values, then this TP is deleted, so this hurts mAP.

So the impact to adjust these values is not determined. It just tells us the mAP for bbox which are bigger than height and bigger than width.

Hi Morganh,
The mAP which I have mentioned is at 160/180 epochs and I have run these models for training separately and continuously.

I have been training for image size of 512x512.(have padded and adjusted the aspect ratio of input images accordingly)

Another thing is till now I have been keeping the ‘autoweighing’ as ‘TRUE’. Will now keep it false and let you know the results for that.
I would like to know more about this autoweighing parameter and how does it affect my training process.

Will you be willing to share the spec file for you experiments on COCO with detectnet ?

I’m still training on COCO dataset(114k training data + 8k val data). So spec file is still not settle down.
FYI. Som mAP result of COCO dataset is as below
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

You can see it only ranges from 16 to 43 for 80 classes.

So, I suggest you to train with your actual own data which reflects your project(or product). That will really make sense. Also, you can try if KITTI dataset would help for your project or product. It also contains the classes which meet your requirement.