Nvidia TLT

rdev20 · October 15, 2020, 3:17pm

Hi,
I am planning to retrain model for deepstream, I followed this instruction, Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
I used default kitti training dataset, DetectNet_v2-ResNet18 for pedestrain, car and cyclist.
I integrated it on jetson xavier nx, but the result is not good at all. The speed is very low about 1 frame per second and the bounding boxes are completely off. I attached a pic of it.
Can you please let me know what I am missing and what should I do to make it better?
Another question was about resnet10 model in primary_detector, is there any solution to improve its accuracy and add some sample data to one of the classes? (for example adding industrial cars to the car dataset)

I would appreciate it if you help me with this issue, I am new to this topic and I have no idea.
Thanks

Morganh · October 16, 2020, 2:04am

Please share more details about how you trained your model.
Did you use default jupyter detectnet_v2 notebook to train?

rdev20 · October 16, 2020, 2:08am

Thanks,
Yes I used the default jupyter.
Example detectnect v2.

Morganh · October 16, 2020, 2:11am

If possible, can you save the notebook as an html file and attach here?
More, how about the mAP in the last epoch?

rdev20 · October 16, 2020, 6:32pm

Hi Morganh,

I uploaded it here: Gofile - Free file sharing and storage platform
Please let me know if you need more details.
Thanks

Morganh · October 19, 2020, 8:49am

From your detailed html file, I can find

Your pruned model can get 77% mAP in the end. Its inference result, see section 8, is good.
In section 9, when you run tlt-export with GeForce GTX 1650, it meets OOM error.
Also, in section 9-B, when generate trt engine, the OOM error occurs.
In section 10, there is the same wrong inference result(bbox is not correct ) when you run tlt-infer.

Thus, please make sure you can run section 9 and 10 successfully.

Suggestion:

When you run tlt-infer, try examples/specs/detectnet_v2_inference_kitti_tlt.txt to confirm your tlt model can get good inference result (section 10)
For etlt model, please use below etlt when you run tlt-export (section 9) . Seems that it does not meet OOM error. Suggest you copy this etlt model(below bold font)into NX to deploy.

!tlt-export detectnet_v2
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
–cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor
–data_type int8
–batches 10
–batch_size 4
–max_batch_size 4
–engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8
–cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
–verbose

rdev20 · October 20, 2020, 2:56pm

Thank you very much Morganh
I will test your suggestions and I will get back to you soon.
Could you please let me know how I can add some pictures ( a dataset of 10000 pics of the industrial car) to this model? Is resnet10 in primary detector folder more accurate than this resnet10? how can I add more dataset to that one?
Also, I would appreciate it if you let me know how I can use this model for deepstream python.

Many thanks

rdev20 · October 20, 2020, 3:48pm

Morganh,

I have a question about part 3. RUN TLT traing,
As you can see in my records (detectnet_v2.html), I think something is not correct:
0 successful operations.
0 derived errors ignored.
I am a little confused about what should I do. Can you please let me know the correct solution for my issue?

Morganh · October 21, 2020, 2:22am

If you have your own dataset, please resize the images/labels into the resolution you want to train, and then generate tfrecords via tlt-dataset-convert. (Reference: section 1-B, Prepare tf records from kitti format dataset)
What do you mean “resnet10 in primary detector folder” and “this resnet10”?

Morganh · October 21, 2020, 2:40am

For your part3 tlt-train, I think previously you can run the training successfully, but meet error now.
The same error as Tlt detectnet training focusing on a particular class? - #16 by beefshepherd , please create a new result folder.

rdev20 · October 22, 2020, 5:31pm

Hi Morgan,
I tried it again and the problem of oom error seems to be solved.
however, the output in section10 is not correct yet. I uploaded it here: Gofile - Free file sharing and storage platform
I have not tested it on jetson yet, but I think first the problem needs to be solved.
I would appreciate it if you help me.

Morganh · October 23, 2020, 6:32am

According to latest result, see section 9-A, you already generate the etlt model successfully.
You can copy it into jetson device and run inference. This is the 1st option for deployment.

But from section 9-B, it is still OOM when you run tlt-converter in your host PC. So you still get the wrong result at section 10.
If you want to run inference in Jetson device, you can ignore this OOM in your host PC. You need to download tlt-converter(Jetson version) and run tlt-converter in your Jetson device.
Then deploy the generated trt engine. This is the 2nd option for deployment.

rdev20 · October 23, 2020, 3:33pm

Thank you very much for your help Morgan.

I tested it on jetson, it works, however it is not as good as the default resnet10 on jetson. Maybe I need to change the pth for more accuracy or use a large data set for better training. Correct?

About my other question, I am not sure if I understood you correctly, I want to add more pics to vehicle images in resnet10 in …/samples/models/Primary_Detector folder (for now it is not trained for industrial cars and I want to add this feature).
Do I need to use the Kitti dataset with 7500 images and add my data set to this one? and then the same process?
Doesn’t it affect the accuracy and make it worse because of the low number of datasets? (only 7500 for training and 7500 for test). Is there any solution to have the same accuracy as the default resnet10 with my customized dataset?

rdev20 · October 23, 2020, 6:55pm

Morgan,

I tried to create an engine file using tlt-converter as you described here:

but when I run the python file (I editted config file with the new engine file) I receive this error:
ERROR: failed to build network since there is no model file matched.
Please kindly advise me on this matter.

Morganh · October 26, 2020, 6:33am

Hi @rdev20
For how to use tlt-converter to create an TRT engine, please refer to Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation,
then deploy the trt engine according to Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
I do not know what is the “python file” you mentioned.

More, for resnet10 inside …/samples/models/Primary_Detector, it is not related to TLT.
See more in Transfer Learning toolkit models vs Deepstream models on the Nano
Dataset used for training sample models
Training of resnet-10 using DIGITS for object detection - #3 by imbatraman

Topic		Replies	Views
Finding inaccurate result while testing model(TLT trained model) with deepstream TAO Toolkit	14	1294	October 12, 2021
Can't get TLT trained model get to work on Deepstream - Jetson (NX) TAO Toolkit	2	921	October 12, 2021
TLT Detecnet(Both Resnet10&18) Model cannot detect anything in Deepstream 5.0 , JetPack 4.4 TAO Toolkit	7	693	October 12, 2021
TLT trained model accuracy worse after deployment TAO Toolkit	11	981	October 12, 2021
Detectnetv2 resnet18/resnet10 on jetson nano. TAO Toolkit	4	963	October 12, 2021
Little to no detection using TLT Faster-RCNN trained model on Deepstream-App TAO Toolkit	13	1222	October 12, 2021
How to build resnet10 equivalent with TLT for deepstream TAO Toolkit	12	1643	October 12, 2021
Tlt3.0 train yolov4 of resnet10, "tlt yolo_v4 inference" could get right bboxes, but deepstream5.1 get wrong result TAO Toolkit	9	803	October 12, 2021
transfert learning toolkit-> export model TAO Toolkit	11	3780	October 12, 2021
Can't get TLT trained model get to work on Deepstream - Jetson (NX) DeepStream SDK	4	1288	October 12, 2021

Nvidia TLT

Related topics