PeopleNet v2.1 (resnet 34) pretrained model gives very bad results on custom dataset

I ran inference on a set of images to detect People in all images. For running an inference, I downloaded PeopleNet unpruned pretrained model using the following command:

wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplenet/versions/unpruned_v2.1/files/resnet34_peoplenet.tlt

My inference config file contains following:

inferencer_config{
  # defining target class names for the experiment.
  # Note: This must be mentioned in order of the networks classes.
  target_classes: "Person"
  target_classes: "Bag"
  target_classes: "Face"
  # Inference dimensions.
  image_width: 960
  image_height: 544
  # Must match what the model was trained for.
  image_channels: 3
  batch_size: 16
  gpu_index: 2
  # model handler config
  tlt_config{
    model: "PATH TO DOWNLOADED PRETRAINED .TLT MODEL"
  }
}

bbox_handler_config{
  kitti_dump: true
  disable_overlay: false
  overlay_linewidth: 2
  classwise_bbox_handler_config{
    key:"Person"
    value: {
      confidence_model: "mean_cov"
      output_map: "Person"
      bbox_color{
        R: 0
        G: 255
        B: 0
      }
      clustering_config{
        clustering_algorithm: NMS
        coverage_threshold: 0.005
        nms_iou_threshold: 0.5
        nms_confidence_threshold: 0.01
      }
    }
  }
}

While performing evaluation of generated detections against Ground Truth file using pyCocoTools, I received AP@0.5 (for person class) as 0.295.

That is very less accuracy than what I had expected to be. So, I would like to know the reason behind this low accuracy.

May I know how do your test images look like? If possible, could you provide few test images for test?