People Net -

I am looking at the recently uploaded pretained model peoplenet model

Q) Would it be possible to add another class to detect such as helmets for example? When retraining it I want to keep the original class of people detection, but want to add another class such as helmets

Q) One of the usecases mentions social distancing; is there any pointers to understanding how that can be done with this model ?

1 Like
  1. It is possible for you to train your own data with the unpruned peoplenet model as pretrained weight. You need to prepare the helmets’ images/labels, resize to 960x544, and set the training spec accordingly. That’s for one class. If you want to train a 4-classes detector, you also need to add some images/labels for person/bags/faces.

  2. PeopleNet can be used to accurately count people in a crowded environment for security

1 Like

So, is my understanding correct to train a 4-class detector - hypothetically - I can add 5 images for training for each of those classes; but the accuracy will still be high because of the original weights?

My end goal is to add a class to peoplenet - retrain with the 4 classes - but I won’t have all the data you used when training the original peoplenet - I will have a much smaller version. Will the accuracy of the retrained model with the classes of person/bag/faces be essentially the same of the original model?

Is my understanding correct? I am assuming that I have to explicitly mention the classes the exact same as “person” “bags” “faces” so the model knows?

Hi ishan,
Firstly, please prepare your own dataset for 3 classes: Person, Bag, Face. The quantity of data depends on you. Smaller is ok.
And set the correct class name in the training spec accordingly.
The class name is as below.

nvidia@nvidia:/opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models$ cat labels_peoplenet.txt

Then you can use “tlt-evaulate” to check if the peoplenet pretrained model “resnet34_peoplenet.tlt” has a good mAP.

$ tlt-evaluate detectnet_v2 -e spec_3class.txt -m resnet34_peoplenet.tlt -k tlt_encode

Normally, the mAP will be high. It means the peoplenet weights take effect on your own 3 classes data.

Then, you can prepare the data of the 4th class, and set the training spec accordingly. Trigger the training.

Can I retrain the peoplenet on my 4th class, without adding newer data to the existing classes to bags and faces?

So, you just want to train only one class? You can try, but I’m afraid it is not working. Because after training, your tlt model is just a detection model which will detect only one class.

I want to keep the ability to track people from people net, but i want to add another class. I am not interested in bags or faces, but if it means sacrificing the people class I am ok with training on 4 classes (3 original classes + 1 my new class).

If i want to add a 4 th class to people net - is that possible?

4) Cardboard Boxes (My classes)

After retraining, will my new model have that sam accuracy on people as the original peoplenet?

For your case, you can only run two classes. One is Person, another is your new class.
Need to prepare the data for both classes.

I will do that, just to confirm, by using 2 classes (person and cardboard boxes), this newly trained model will have the same performance for the person class as the original peoplenet model ?

For your case, actually it is a new training. The peoplenet model contains pretrained weights that may be used as a better starting point for people class.
I also do similar experiment on my side. I train the “People” class and a new class “cart”.
Prepare some data for both classes. All the data are 960x544.
Then set the training spec and also tune the class_weight.
The unpruned peoplenet pretrianed model works as a good pretrained weight.

Your new model with the people and carts is as good as the original peoplenet model when it comes to detecting people?

Also, thank you for doing this experiment.

Actually it is a new training because a new class is added.
I prepare 14k person data and 3.7k cart data, run totaly 10 epochs, 40 minutes. The AP for Person is about 60%.
I did not finetune any hyper-parameters a lot.
So, after finetune or run longer, I believe the mAP can still improve further.

Would it be possible to share your person dataset?

Sorry, this data is from Nvidia internal only.

I understand, thanks.

Good Morning,

In relation to transfer learning, would it be possible to leverage the already good PeopleNet to be able to detect gray images (Infrared Cameras).

Retraining completely is not possible as "PeopleNet v1.0 model was trained on a proprietary dataset with more than 5 million objects for person class. "

If we can use transfer learning to retain the networks ability to detect people in colour images and gray scale images with the same level of accuracy

If so how many new “gray” images would we need to use? How high is the risk of over training the network with the new gray images


Dark-lighting, Monochrome or Infrared Camera Images

The PeopleNet model was trained on RGB images in good lighting conditions. Therefore, images captured in dark lighting conditions or a monochrome image or IR camera image may not provide good detection results.

More reference:

For training on gray scale images only, please consider to set

output_image_channel: 1

About how many images need to use, refer to Dataset Practices


I was aware Peoplenet was not trained for gray scale images so I want to be able to detect people in the day time and at night.

As for retraining I thought the purpose of transfer learning was to reduce the need for huge amounts of new data ?

The link you gave just tells the amount of data used to train the network for RBG images at different distances, half indoors half outdoors?

Does this mean we would need a similar number of images from IR cameras, and will this not reduce the detections of colour images since we cant add Nvidias Training Images to the dataset

No, you need not a similar number of images. That is why unpruned peoplenet model is provided in ngc. User can set it as pretrained model and train their own data. If your data are colour images, the transfer learning should run smoothly. But as the link said, “monochrome image or IR camera image may not provide good detection results”, that is the known limitation.

Thank you for the response.

I need to detect both daytime and night time camera images.

So should I disregard PeopleNet?

Does this mean the Unpruned PeopleNet will not be able to be trained to recognize IR camera images?

If I aquire a large sum of IR camera images, will training on the unpruned model completely ruin the Colour Image detection afterwards