Hi neophyte1,
In SSD example, output_image_width/height inside training config file is set to 1248x384 because KITTI dataset(mostly 1248x384) is used in the previous step of Jupyter notebook.
The setting in config file should match the width/height of the network input.
Currently for feature extraction architecture in SSD, only “resnet10” and “resnet18” are supported.
For each, there are 6 feature maps.
The aspect_ratios_global is a list of aspect ratios for which anchor boxes are to be generated. This list is valid for all prediction layers as follows.
[[1.0, 2.0, 0.5, 3.0, 0.3333333333333333], [1.0, 2.0, 0.5, 3.0, 0.3333333333333333], [1.0, 2.0, 0.5, 3.0, 0.3333333333333333], [1.0, 2.0, 0.5, 3.0, 0.3333333333333333], [1.0, 2.0, 0.5, 3.0, 0.3333333333333333], [1.0, 2.0, 0.5, 3.0, 0.3333333333333333]]
The aspect_ratios should be a list of lists inside quotation marks. The length of the outer list must be equivalent to the number of feature layers used for anchor box generation.
If it is
"[[1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0, 2.0, 0.5, 3.0, 0.33]]"
Then, the i-th layer will have anchor boxes with aspect ratios defined in aspect_ratios[i]. Totally 6 layers.
The last layer has anchor boxes with aspect ratio [1.0, 2.0, 0.5, 3.0, 0.33]