Hi all, we have released a new sample plugin for DeepStream 2.0 performing YOLO (You Only Look Once) object detection, accelerated with TensorRT. Supported models include YOLO v2 & v3.
Did you try running yolov2-tiny using this code? When I use the cfg and associated weights file from pjreddie’s website the code runs without any problems (after also changing the kOUTPUT_BLOB_NAME variable to region_16 instead of region_32). For the rest I guess everything remains the same. The downsampling factor of 32 and anchor boxes are the same for both models.
However, it is not detecting anything on the data/dog.jpg example. The original darknet code using this cfg and weightfile does output 3 objects for this image.
After some investigation I noticed that yolov2-tiny contains a maxpool layer with size=2 and stride=1. So there is no downsampling after this layer. However because the default padding option for the MaxPool layer is 0 the output size of this layer becomes 12x12 instead of 13x13.
Any idea how do correctly handle this situation?
Without padding the resulting dimensions are 12x12. With padding we get 14x14. We should be ending up with 13x13 though…
I had implemented a workaround using a padding layer but this was actually not 100% correct. However, by now this problem is fixed in a better way in the original repository so I follow this approach.
Here’s roughly which changes you should make.
trt_utils.h defines a class to compute Padding sizes for MaxPool layers.
class YoloTinyMaxpoolPaddingFormula : public nvinfer1::IOutputDimensionsFormula
You can then give your network a pointer to a YoloTinyMaxpoolPaddingFormula instance which will then be used to compute the correct adding for any MaxPool layer that will be added to the network.
Basically what it does is that it will use “valid” padding for all layers except the ones that are explicitly marked by name to require “same” padding.
Note that the formulas will only work for square networks. If your network is rectangular you need to make some additional minor changes. Best to run this code in Debug mode because there are some assertions that should warn you when things are going wrong.