I am using TLT to train a detection model (DetectNetV2 ResNet18) on a custom object class, that is then deployed within a Deepstream app based on the python examples.
I noticed that when I train the TLT model and deploy it on the nano in the Deepstream app, I see a considerable drop in detection accuracy. Below is what I see and the options I have explored:
TLT model converted to TRT engine file, and then run on the nano in the DS app:
The TLT model shows a mAP of 90% after training, but when deployed as a TRT engine file on the nano the performance drops. Samples that were correctly inferred in the TLT model are no longer inferred correctly when I used the converted TRT engine file.
What could be causing this? Do I have to change any specific configurations in the Deepstream app / pgie files to ensure the accuracy is maintained from TLT to TRT?
I’m currently using a pre-cluster-threshold of 0.2 in my pgie configuration file
TLT model run as ETLT model on the nano:
The TLT model when run directly as an ETLT model on the nano did not make any detections at all with the same threshold configuration as above. Is there a specific way to implement this?
If not either of these options, is there something else I should try to ensure accuracy is maintained when running a TLT model in a DS app?
Sorry, what I meant to say is that when I take the model trained with TLT and deploy on the Jetson Nano (without conversion to a .trt file), there seems to be almost no objects detected (in the test video I am using). If I first convert the model to a .trt file, then there are detections but significantly less when compared to the same test video samples inferred with the model using TLT itself.
I can now get detections from using the .etlt model file. The detection accuracy is the same as the .trt file though (which is worse than the inferences I get from using TLT). I also noticed that changing the “threshold” configuration above seems to have no affect. Is there another threshold configuration option I’m missing?
I am currently in the process of retraining the model using TLT 3.0.
Sorry for late reply. For reference, I suggest you running a detetect_v2 sample which provides at /opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models
Refer to README file. Run with $ deepstream-app -c <deepstream_app_config> .
For example, to run peoplenet(it is based on detectnet_v2 network) $ deepstream-app -c deepstream_app_source1_peoplenet.txt
Inside deepstream_app_source1_peoplenet.txt, there is config_infer_primary_peoplenet.txt.
BTW, you can set pre-cluster-threshold to a very low value to check if there are more bboxes.
After training the same dataset (with the same detectnet/resnet18 network and training configuration) on TLT3.0, I have found that training accuracy (AP) plateaus at 33%. This is much less than what I was getting when I used TLT1.0 (~90% AP), however when I deployed this new model on the Jetson Nano and tested with the same test video I’ve been using previously, the detection performance looks unchanged.
Does this mean the reported AP value from TLT1.0 was not correct?
Are there different methods used to calculate AP between the two versions that could result in this difference in reported AP?
Accuracy on TLT3.0 training is reaching a plateau after only 10-20 epochs. I have a dataset that contains at least 50,000 examples of the one object class I’m training for. Do you have any suggestions for things I could try, in order to improve the training accuracy?
The train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.