Peoplenet accuracy based on person size in a given image

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) : V100
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) : Peoplenet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) : v3
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hi ,
what is the minimum size of the person which peoplenet can detect for a surveillance image / video given the input video is of 2k resolution, since want to check as the distance of the person increase will the detection accuracy drop down and does illumination / high lighting in the image causes the drop in accuracy

Please see https://docs.nvidia.com/tlt/tlt-user-guide/text/purpose_built_models/peoplenet.html and https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet

NVIDIA PeopleNet model were trained to detect objects larger than 10x10 pixels. Therefore it may not be able to detect objects that are smaller than 10x10 pixels.

And also there are some limitations mentioned in the model card.

Occluded Objects

When objects are occluded or truncated such that less than 20% of the object is visible, they may not be detected by the PeopleNet model. For people class objects, the model will detect occluded people as long as head and shoulders are visible. However if the person’s head and/or shoulders are not visible, the object might not be detected unless more than 60% of the person is visible.

Dark-lighting, Monochrome or Infrared Camera Images

The PeopleNet model were trained on RGB images in good lighting conditions. Therefore, images captured in dark lighting conditions or a monochrome image or IR camera image may not provide good detection results.

Warped and Blurry Images

The PeopleNet models were not trained on fish-eye lense cameras or moving cameras. Therefore, the models may not perform well for warped images and images that have motion-induced or other blur.

Face and Bag class

Although bag and face class are included in the model, the accuracy of these classes will be much lower than people class. Some re-training will be required on these classes to improve accuracy.

Thanks for the response, for what pixel size do we get the best accuracy ? .
I tested the model with different lighting conditions but for dark lighting conditions the detection of people is happening but there is fluctuations of the bounding box, how to get it consistently ?
What is the optimum confidence threshold should we use since if i keep 0.1 i am getting good results but 0.5 i get less detections ?

As mentioned in the model card, NVIDIA PeopleNet model were trained to detect objects larger than 10x10 pixels. Therefore it may not be able to detect objects that are smaller than 10x10 pixels.
There is no conclusion that which pixel size can get the best accuracy. How about the average resolution of your test images, and how about the bboxes as well? If they are too small, please try to resize the images/labels and retry to run inference.
For the lighting conditions of peoplenet, the “limitation” section already gives the details. And if necessary, retraining is needed.
For confidence threshold, if set to 0.1, there will be more bboxes.

  1. I was looking for an image resoultion 2k if the person pixel size is 128x128 i will get best accuracy of the detection do we have any analysis like this
  2. So for different lightinig condition we need to re-train it then
  3. yup i am getting more bounding box , but want to understand what is the confidence threshold which you are suggestining

Please run peoplenet etlt model directly and check the result. I think for your case, 2k images, 128x128 bbox, it should detect the people well.
You can download resnet34 pruned version from https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet/files?version=pruned_quantized_v2.1
And run it in deepstream.
$ cd /opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models
$ deepstream-app -c deepstream_app_source1_peoplenet.txt

For confidence threshold, do you mean pre-cluster-threshold? It is just a threshold for you to filter the bboxes.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.