See https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet
Dark-lighting, Monochrome or Infrared Camera Images
The PeopleNet model was trained on RGB images in good lighting conditions. Therefore, images captured in dark lighting conditions or a monochrome image or IR camera image may not provide good detection results.
More reference:
For training on gray scale images only, please consider to set
output_image_channel: 1
About how many images need to use, refer to Dataset Practices - #3 by Morganh