Please see https://docs.nvidia.com/tlt/tlt-user-guide/text/purpose_built_models/peoplenet.html and https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet
NVIDIA PeopleNet model were trained to detect objects larger than 10x10 pixels. Therefore it may not be able to detect objects that are smaller than 10x10 pixels.
And also there are some limitations mentioned in the model card.
Occluded Objects
When objects are occluded or truncated such that less than 20% of the object is visible, they may not be detected by the PeopleNet model. For people class objects, the model will detect occluded people as long as head and shoulders are visible. However if the person’s head and/or shoulders are not visible, the object might not be detected unless more than 60% of the person is visible.
Dark-lighting, Monochrome or Infrared Camera Images
The PeopleNet model were trained on RGB images in good lighting conditions. Therefore, images captured in dark lighting conditions or a monochrome image or IR camera image may not provide good detection results.
Warped and Blurry Images
The PeopleNet models were not trained on fish-eye lense cameras or moving cameras. Therefore, the models may not perform well for warped images and images that have motion-induced or other blur.
Face and Bag class
Although bag and face class are included in the model, the accuracy of these classes will be much lower than people class. Some re-training will be required on these classes to improve accuracy.