Peoplenet accuracy based on person size in a given image

Please run peoplenet etlt model directly and check the result. I think for your case, 2k images, 128x128 bbox, it should detect the people well.
You can download resnet34 pruned version from https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet/files?version=pruned_quantized_v2.1
And run it in deepstream.
$ cd /opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models
$ deepstream-app -c deepstream_app_source1_peoplenet.txt

For confidence threshold, do you mean pre-cluster-threshold? It is just a threshold for you to filter the bboxes.