As far as I know, in TLT we can train the DetectNet neural network. But DetectNet can only predict bboxes on one scale (stride 16). Are there variations of Detectnet, which not only uses stride 16, but also uses stride 8 and stride 32 (all 3 scales at the same time, like in YOLOv3)? Or at least uses 2 scales instead of 3 scales?
Currently, in TLT detectnet_v2 network, it only supports stride 16.
See DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation
DetectNet_v2 generates 2 tensors, cov and bbox. The image is divided into 16x16 grid cells. The cov tensor (short for “coverage” tensor) defines the number of grid cells that are covered by an object. The bbox tensor defines the normalized image coordinates of the object top left (x1, y1) and bottom right (x2, y2) with respect to the grid cell. For best results, you can assume the coverage area to be an ellipse within the bbox label with the maximum confidence assigned to the cells in the center and reducing coverage outwards. Each class has its own coverage and bbox tensor, thus the shape of the tensors are as follows:
- cov: Batch_size, Num_classes, image_height/16, image_width/16
- bbox: Batch_size, Num_classes * 4, image_height/16, image_width/16 (where 4 is the number of coordinates per cell)
Ok. I will be using Yolov3 / Yolov4 in TLT.
I looked at the “yolo_config” block and I have another question:
Can I change the number of yolo strides (for example, change the standard 3 strides to 2 strides to the detriment of one of the three prediction scales)? What parameter in this configuration block is responsible for such function?
I am afraid you want to change the feature maps quantity, right?
For example, I want to crop a model by removing one of the strides as shown in the picture (the area I want to remove is highlighted as red in the picture). In darknet, I can remove unnecessary layers via a cfg file. Question: “How can I do the same operation in TLT?”
Currently, TLT Yolo_v4 can provide “freeze_blocks” feature only.