I ran inference on a set of images to detect People in all images. For running an inference, I downloaded PeopleNet unpruned pretrained model using the following command:
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplenet/versions/unpruned_v2.1/files/resnet34_peoplenet.tlt
My inference config file contains following:
inferencer_config{
# defining target class names for the experiment.
# Note: This must be mentioned in order of the networks classes.
target_classes: "Person"
target_classes: "Bag"
target_classes: "Face"
# Inference dimensions.
image_width: 960
image_height: 544
# Must match what the model was trained for.
image_channels: 3
batch_size: 16
gpu_index: 2
# model handler config
tlt_config{
model: "PATH TO DOWNLOADED PRETRAINED .TLT MODEL"
}
}
bbox_handler_config{
kitti_dump: true
disable_overlay: false
overlay_linewidth: 2
classwise_bbox_handler_config{
key:"Person"
value: {
confidence_model: "mean_cov"
output_map: "Person"
bbox_color{
R: 0
G: 255
B: 0
}
clustering_config{
clustering_algorithm: NMS
coverage_threshold: 0.005
nms_iou_threshold: 0.5
nms_confidence_threshold: 0.01
}
}
}
}
While performing evaluation of generated detections against Ground Truth file using pyCocoTools, I received AP@0.5 (for person class) as 0.295.
That is very less accuracy than what I had expected to be. So, I would like to know the reason behind this low accuracy.