Training Custom Object detector with 6 classes

Hi neophyte1,
Glad to know you improve the AP for class car. With resnet18, right? May I know your last result of it?

You mention a new issue for pruned and retrain, right? How did you set pth to 5.2e-6? I recall it is not the default value in the notebook.

Hi Morganh,

With resnet-18 after training I am able to achieve 48% precision for car class. I still do not understand the working of tlt-prune tool. The “pth” value was already mentioned in the jupyter notebook. Hence I used the same. Surprisingly this value seems to work for resnet-10 based model but for resnet-18 based model it does not work.

To give more information with resnet-10 based model, I get the following log:

[INFO] iva.common.magnet_prune: Pruning ratio (pruned model / original model): 0.122844558547

With resnet-18 and same value of “pth”, I get the following log:

[INFO] iva.common.magnet_prune: Pruning ratio (pruned model / original model): 1

I don’t understand why so much difference in the pruning ratio.

Thanks.

See https://devtalk.nvidia.com/default/topic/1065254/transfer-learning-toolkit/how-to-retrain-the-model-after-pruning-/post/5394735/#5394735 for more details about pth.

Hi Morganh,

I have few queries:

  1. Based on your suggestion I am trying to tune class weights for the different classes and try to find a pattern which I can use to progress further. However, I am unable to comprehend the concept of class weights. I had an assumption that the class weight should be inversely proportional to the number of instances. However, this assumption falls flat on a number of occasions. Can you please let me know the significance and usage of class weights. The only information I got from you is that these weights are relative. So does that mean class weights 10, 8, 6 are the same as 5, 4, 3 ?

  2. Why the accuracy drops when I increase the size of the training input image ? I tried out experiments with the same dataset by maintaining the aspect ratio. In one experiment, I scaled to 480x480 resolution and in the other to 640x640 resoltuion. Surprisingly, I noticed that there is a drop in accuracy for the 640x640 case. This is counter-intuitive since larger images should imply better accuracy for detection. Generally, object detectors perform better on larger image sizes.

  3. In model_config module of the training spec file there are couple of parameters, “scale” and “offset”. What do these parameters mean? How should they be modified? What is their significance?

  4. I am using pretrained resnet10 model. Can you let me know what input training image size it was trained on ?

  5. What is the difference between detectnet_v1 and detectnet_v2 ?

Please help me out.

Hi neophyte1,

  1. The class weight is assigned to the corresponding target class’s cost. It can balance the training dataset for each class. Weights 10, 8, 6 are the same as 5, 4, 3 since the proportion of each class is the same for the two cases.
  2. As we discussed previously, training image size should be closer to input image size. Our experience tells a rule, make training as close as actual.
  3. According to tlt doc Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation , “objective_set” defines what objectives is this network being trained for. For object detection networks, we set to learn cov and bbox. These parameters should not be altered for the current training pipeline.
  4. Sorry, I did not know about that. Furthermore,the pretrained model can be used for any input sizes, it doesn’t care the input dimensions.
  5. The naming of DetectNet_v2 is due to some historical reason in NVIDIA during the development of the DetectNet algorithm. We just want to distinguish it with our internal older version of the DetectNet algorithm.

Hi Morganh,

I am trying to use models trained using TLT with Deepstream SDK. I wish to know how the resizing of the input image is done. Is the aspect ratio maintained while resizing before making inferences?

Thanks.

Hi neophyte1,
For integrating a Detectnet_v2 model into DeepStream, as for the aspect ratio, need to set input-dims correctly in DS configuration file.

input-dims=c;h;w;0   # where c = number of channels, h = height of the model input , w = width of model input, 0: implies CHW format.