Which detection model will give more accuracy for arial view image detection!

All the images in this datasets are smaller objects. The larger objects are in other datasets, where i got good mAP.

While training I’m getting No positive ROIs. for every epochs. I just referred the following link.

https://devtalk.nvidia.com/default/topic/1065592/transfer-learning-toolkit/faster-rcnn-roi-issue/

Hi samjith888,
You set 1024x544 in your training spec file.
size_height_width {
height: 544
width: 1024
}

So,I want to confirm that in your 1024x544 dataset, the objects are small, right?
Could you tell me the average size(height? width?) of the small objects?
I think you can simply check this via your dataset’s labels.

Thanks.

Image actual resolution is 4096*2160 where a object location is the following.

Helmet 0.00 0 0.00 316.26 713.94 403.70 767.30 0.00 0.00 0.00 0.00 0.00 0.00 0.00

So the average smaller object size will be 8754 in a 40962160 resolution image.

Hi samjith888,
Have you modified the bbox(x1,y1,x2,y2) in your label files? You have resized the images from 40962160 to 1024544.

Ignore my comment.

Do you know how to figure out this issue ?

Hi samjith888,
Please try more experiments.
1 Make sure the anchor box size is almost the same as the objects’ size. In your config file,
your anchors are as below. So they can cover the small objects(87/4, 54/4).
But I suggest you to check your small objects’ size further, to see if it is needed to trigger more experiments for different anchor ratio or scale.

array([[[ 8.      ,  8.      ],
        [ 5.656854, 11.313708],
        [11.313708,  5.656854]],

       [[16.      , 16.      ],
        [11.313708, 22.627417],
        [22.627417, 11.313708]],

       [[32.      , 32.      ],
        [22.627417, 45.254833],
        [45.254833, 22.627417]]], dtype=float32)

2.Try via larger backbones, resnet34 or vgg19

3.Try other networks in TLT as well to see if there is any improvement.

  • try ssd, with lower ratio too.
  • try detectnet_v2. In dectent_v2, set lower minimum_bounding_box_height(try to set to 3), lower minimum_height and minimum_width (try to set to 0) and lower minimum_detection_ground_truth_overlap (try to set to 0.3)

I have tried 2nd and 3rd steps, but still getting the lower mAP. Now I’m experimenting with the first step,

Also i tried with
zoom_min 1.0
zoom_max 8.0
This also didn’t work out.

Should i change the ratios in anchor_box_config?

Yes, you can try.
For detectnet_v2, can you attach your training spec and full training log?
Note: For detectnet_v2 and SSD, all of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.That means if you set 1024544 in the training spec, you need to resize your images offline and you need to modify the bbox(x1,y1,x2,y2) in your label files since you have resized the images from 40962160 to 1024*544.

I want solution for faster_rcnn , which is work with larger object detection.

[quote=“”]

I want solution for faster_rcnn , which is work with larger object detection.

anchor_box_config {
scale: 5
scale: 11
scale: 21
ratio: 1.0
ratio: 0.5
ratio: 2.0
}

Is this a right idea ? i have reduced the scale value again.

87/4 =21.75
54/4 =13.5

Hi samjith888,
For anchor_box_config {
scale: 8.0
scale: 16.0
scale: 32.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}

The anchor will be

array([[[ 8.      ,  8.      ],
        [ 5.656854, 11.313708],
        [11.313708,  5.656854]],

       [[16.      , 16.      ],
        [11.313708, 22.627417],
        [22.627417, 11.313708]],

       [[32.      , 32.      ],
        [22.627417, 45.254833],
        [45.254833, 22.627417]]], dtype=float32)

because
8*sqrt(1)= 8
8*sqrt(0.5)= 5.656854
8*sqrt(2)  = 11.313708
16*sqrt(1)= 16
16*sqrt(0.5)= 11.313708
16*sqrt(2)  = 22.627417
32*sqrt(1)= 32
32*sqrt(0.5)= 22.627417
32*sqrt(2)  = 45.254833

If you change anchor_box_config , please calculate as above way, to see if it can cover your small objects.

Hi Morganh,

Is the faster_rcnn can't detect smaller objects? can u suggest some more changes in faster_rcnn train spec files?

When observed in label file, i found that two objects have an average size like following
Mssing_P 25x23
Extra_P 26X13

These two objects are the smallest objects in my dataset. (Image resolution : 4096*2160 )

Hi samjith888,
As mentioned previously, you need to trigger experiments.To improve accuracy on small objects, the most common trick is to use a smaller set of anchors. The anchor sizes should have a size that is similar to the small objects’ size. Anchor ratios can be kept unchanged.
You can also train only two classes firstly instead of 5 classes. Trigger less classes in order to narrow down.
Mssing_P 25x23
Extra_P 26X13

Note that for above size, since you change from 40962160 to 1024544, you need to make anchor sizes cover
25/4 * 23/4
26/4 * 13/4

How to change the anchor sizes to cover the above anchor sizes?

anchor_box_config {
scale: 8.0
scale: 16.0
scale: 32.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}

Refer to the way in https://devtalk.nvidia.com/default/topic/1069737/transfer-learning-toolkit/which-detection-model-will-give-more-accuracy-for-arial-view-image-detection-/post/5420836/#5420836,
for the two small objects,
Mssing_P 25x23
Extra_P 26X13

since you change from 4096x2160 to 1024x544, so it becomes
6.25x5.75
6.5x3.25

You can try

anchor_box_config {
scale: 4.0
scale: 4.6
scale: 5.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}

It can cover anchor sizes like
4x4,
2.828x5.656
5.656x2.828
4.6x4.6
3.25x6.5
6.5x3.25
5x5
3.535x7.07
7.07x3.535

can i use different resolution images for training ?

Sure, you can set your original resolution to others.
Then calculate the new pixels range of the small objects. And set a better anchor_box_config.

Hi Morganh,

I meant that i have a data set which consist of images with different resolutions ( eg:41202240, 800450, 1080120 ,300 250 etc). So can i use this dataset for training? Or TLT didn’t only unique sized images ?

For detectnet_2 and SSD network, all of the images must be resized offline to the final training size.
For faster-rcnn, you don’t need to resize the image.

See tlt user guide’s chapter 2 for more details.