I add more detail information on James’ question.
the story started from using nvidia digits training single class detectnet using coco subset (the subset is created by filtering all classes but person, in other words, it is subset only including person, the other classes is reassigned to be dontcare).
as we know, detectnet is to modify googlenet for object detection ( https://devblogs.nvidia.com/detectnet-deep-neural-network-object-detection-digits/ )
- training run 5 days, mAP is about .26
- we tested the trained model in coco subset inside digits, work fine, and see persons are detected.
- after changing deploy.prototxt (mainly changing height and width), and we use example.py to test our images (DetectNet Python Inference · GitHub), work fine, we see persons are detected by the trained model.
- next, we decide to run the trained model by tensorrt. in order to run by tensorrt, the last python layer in deploy.prototxt must be removed.
layer {
name: “cluster”
type: “Python”
bottom: “coverage”
bottom: “bboxes”
top: “bbox-list”
python_param {
module: “caffe.layers.detectnet.clustering”
layer: “ClusterDetections”
param_str: “1920, 1072, 16, 0.4, 2, 0.02, 22, 1”
}
}
tried parsefunc = googlenet, nvidia type 1 or 2, always report segmentation fault. tried parsefunc = Resnet, there is no segmentation fault. however, there is no bounding box detected. selecting parsefunc=Resnet does not make any sense, since subnet of detectnet is googlenet. I do not know what are nvidia type 1 or 2. but I guess it is related to object detection. I guess parsefunc = googlenet is about classification.
interestingly, we tried Resnet in tensorrt, there is no segmentation fault, we see bounding box detected.
after removing the above python layer, detectnet and Resnet is similar. the difference is, the subnet of detectnet is googlenet, yet the subnet of Resnet is feature extraction part of Resnet. I read deploy.prototxt and compare output layers ( coverage and bounding boxes) of Resnet and detectnet, they are the same. we run both of Resnet and detectnet (removing the last layer) by example.py (DetectNet Python Inference · GitHub), the outputs (coverage and bboxes ) are grid-based:
for detectnet (single class), the shape of coverage is batchsize x 1 x (height/stride) x (width/stride) and the shape of bboxes is batchsize x 4 x (height/stride) x (width/stride)
for Resnet (3 classes), the shape of coverage is batchsize x 3 x (height/stride) x (width/stride) and the shape of bboxes is batchsize x 12 x (height/stride) x (width/stride)
the results are comparable. In other words, their difference is subnet used in feature extraction.
The last layer (python) is to convert grid-based coverage and bboxes to the detected bounding boxes list. I guess tensorrt internalizing it.
This is the whole story.
The weired thing is why there is segmentation fault in detectnet, but not in Resnet. your response is appreciated in advance