I am trying to isolate all the images in the VOC dataset (The PASCAL Visual Object Classes Homepage)
that have certain classes - notably ‘person’ and ‘cat’
The results were totally wacko so I dug out just a few test cases to test with standard software.
The only changes I have made are to switch the colours for ‘aeroplane’ (bright red) and ‘cat’ (some mushroom flavour) because the colours for ‘cat’ and ‘person’ were too close to visually tell apart.
The first test case is one where the mask identified ‘cat’ but the image is clearly some busses.
The colour in the mask is neither that for ‘bus’ nor for ‘cat’
and the stats results are significantly different between the voc-512x320 and voc-320x320 networks
./segnet.py --network=fcn-resnet18-voc-512x320 --stats 2007_002024_bus.jpg 2007_002024_bus_512x320_out.jpg
grid size: 16x10
num classes: 21
-----------------------------------------
ID class name count %
-----------------------------------------
0 background 89 0.556250
1 aeroplane 0 0.000000
2 bicycle 0 0.000000
3 bird 0 0.000000
4 boat 0 0.000000
5 bottle 0 0.000000
6 bus 0 0.000000
7 car 0 0.000000
8 cat 69 0.431250
9 chair 0 0.000000
10 cow 0 0.000000
11 diningtable 0 0.000000
12 dog 0 0.000000
13 horse 0 0.000000
14 motorbike 0 0.000000
15 person 0 0.000000
16 pottedplant 0 0.000000
17 sheep 0 0.000000
18 sofa 0 0.000000
19 train 0 0.000000
20 tvmonitor 2 0.012500
jc@jcjet:~/OpenCV_jet/sample_images/cats_n_busses$
./segnet.py --network=fcn-resnet18-voc-320x320 --stats 2007_002024_bus.jpg 2007_002024_bus_320x320_out.jpg
grid size: 10x10
num classes: 21
-----------------------------------------
ID class name count %
-----------------------------------------
0 background 64 0.640000
1 aeroplane 0 0.000000
2 bicycle 0 0.000000
3 bird 0 0.000000
4 boat 0 0.000000
5 bottle 0 0.000000
6 bus 0 0.000000
7 car 0 0.000000
8 cat 0 0.000000
9 chair 0 0.000000
10 cow 0 0.000000
11 diningtable 0 0.000000
12 dog 0 0.000000
13 horse 0 0.000000
14 motorbike 0 0.000000
15 person 0 0.000000
16 pottedplant 0 0.000000
17 sheep 0 0.000000
18 sofa 0 0.000000
19 train 0 0.000000
20 tvmonitor 36 0.360000
The second image is just one cat.
The using voc-512x320 the stats has identified a high count for ‘person’; using voc-320x320 its a high count for ‘tvmonitor’
In both cases the mask output image is coloured red for ‘cat’ (remember I switched colours)
./segnet.py --network=fcn-resnet18-voc-512x320 --stats 2007_004856_cat.jpg 2007_004856_cat.jpg_512x320_out.jpg
-----------------------------------------
ID class name count %
-----------------------------------------
0 background 70 0.437500
1 aeroplane 0 0.000000
2 bicycle 0 0.000000
3 bird 0 0.000000
4 boat 0 0.000000
5 bottle 0 0.000000
6 bus 0 0.000000
7 car 0 0.000000
8 cat 0 0.000000
9 chair 0 0.000000
10 cow 0 0.000000
11 diningtable 0 0.000000
12 dog 0 0.000000
13 horse 0 0.000000
14 motorbike 0 0.000000
15 person 85 0.531250
16 pottedplant 0 0.000000
17 sheep 0 0.000000
18 sofa 0 0.000000
19 train 1 0.006250
20 tvmonitor 4 0.025000
./segnet.py --network=fcn-resnet18-voc-320x320 --stats 2007_004856_cat.jpg 2007_004856_cat_320x320_out.jpg
grid size: 10x10
num classes: 21
-----------------------------------------
ID class name count %
-----------------------------------------
0 background 51 0.510000
1 aeroplane 0 0.000000
2 bicycle 0 0.000000
3 bird 0 0.000000
4 boat 0 0.000000
5 bottle 0 0.000000
6 bus 0 0.000000
7 car 0 0.000000
8 cat 0 0.000000
9 chair 0 0.000000
10 cow 0 0.000000
11 diningtable 0 0.000000
12 dog 0 0.000000
13 horse 0 0.000000
14 motorbike 0 0.000000
15 person 0 0.000000
16 pottedplant 0 0.000000
17 sheep 0 0.000000
18 sofa 0 0.000000
19 train 0 0.000000
20 tvmonitor 49 0.490000
So for the 2nd test there is a pretty clear disconnect between what the stats report and what the mask shows.
It would be great if sombody could verify my results and/or tell me what I doing wrong
Thanks
JC






