I am looking at the recently uploaded pretained model peoplenet model
Q) Would it be possible to add another class to detect such as helmets for example? When retraining it I want to keep the original class of people detection, but want to add another class such as helmets
Q) One of the usecases mentions social distancing; is there any pointers to understanding how that can be done with this model ?
It is possible for you to train your own data with the unpruned peoplenet model as pretrained weight. You need to prepare the helmets’ images/labels, resize to 960x544, and set the training spec accordingly. That’s for one class. If you want to train a 4-classes detector, you also need to add some images/labels for person/bags/faces.
PeopleNet can be used to accurately count people in a crowded environment for security
So, is my understanding correct to train a 4-class detector - hypothetically - I can add 5 images for training for each of those classes; but the accuracy will still be high because of the original weights?
My end goal is to add a class to peoplenet - retrain with the 4 classes - but I won’t have all the data you used when training the original peoplenet - I will have a much smaller version. Will the accuracy of the retrained model with the classes of person/bag/faces be essentially the same of the original model?
Is my understanding correct? I am assuming that I have to explicitly mention the classes the exact same as “person” “bags” “faces” so the model knows?
Hi ishan,
Firstly, please prepare your own dataset for 3 classes: Person, Bag, Face. The quantity of data depends on you. Smaller is ok.
And set the correct class name in the training spec accordingly.
The class name is as below.
nvidia@nvidia:/opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models$ cat labels_peoplenet.txt
Person
Bag
Face
Then you can use “tlt-evaulate” to check if the peoplenet pretrained model “resnet34_peoplenet.tlt” has a good mAP.
So, you just want to train only one class? You can try, but I’m afraid it is not working. Because after training, your tlt model is just a detection model which will detect only one class.
I want to keep the ability to track people from people net, but i want to add another class. I am not interested in bags or faces, but if it means sacrificing the people class I am ok with training on 4 classes (3 original classes + 1 my new class).
If i want to add a 4 th class to people net - is that possible?
I will do that, just to confirm, by using 2 classes (person and cardboard boxes), this newly trained model will have the same performance for the person class as the original peoplenet model ?
For your case, actually it is a new training. The peoplenet model contains pretrained weights that may be used as a better starting point for people class.
I also do similar experiment on my side. I train the “People” class and a new class “cart”.
Prepare some data for both classes. All the data are 960x544.
Then set the training spec and also tune the class_weight.
The unpruned peoplenet pretrianed model works as a good pretrained weight.
Actually it is a new training because a new class is added.
I prepare 14k person data and 3.7k cart data, run totaly 10 epochs, 40 minutes. The AP for Person is about 60%.
I did not finetune any hyper-parameters a lot.
So, after finetune or run longer, I believe the mAP can still improve further.
In relation to transfer learning, would it be possible to leverage the already good PeopleNet to be able to detect gray images (Infrared Cameras).
Retraining completely is not possible as "PeopleNet v1.0 model was trained on a proprietary dataset with more than 5 million objects for person class. "
If we can use transfer learning to retain the networks ability to detect people in colour images and gray scale images with the same level of accuracy
If so how many new “gray” images would we need to use? How high is the risk of over training the network with the new gray images
Dark-lighting, Monochrome or Infrared Camera Images
The PeopleNet model was trained on RGB images in good lighting conditions. Therefore, images captured in dark lighting conditions or a monochrome image or IR camera image may not provide good detection results.
More reference:
For training on gray scale images only, please consider to set
I was aware Peoplenet was not trained for gray scale images so I want to be able to detect people in the day time and at night.
As for retraining I thought the purpose of transfer learning was to reduce the need for huge amounts of new data ?
The link you gave just tells the amount of data used to train the network for RBG images at different distances, half indoors half outdoors?
Does this mean we would need a similar number of images from IR cameras, and will this not reduce the detections of colour images since we cant add Nvidias Training Images to the dataset
@Andrew_Smith
No, you need not a similar number of images. That is why unpruned peoplenet model is provided in ngc. User can set it as pretrained model and train their own data. If your data are colour images, the transfer learning should run smoothly. But as the link said, “monochrome image or IR camera image may not provide good detection results”, that is the known limitation.