Nvidia TLT

Hi,
I am planning to retrain model for deepstream, I followed this instruction, Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
I used default kitti training dataset, DetectNet_v2-ResNet18 for pedestrain, car and cyclist.
I integrated it on jetson xavier nx, but the result is not good at all. The speed is very low about 1 frame per second and the bounding boxes are completely off. I attached a pic of it.
Can you please let me know what I am missing and what should I do to make it better?
Another question was about resnet10 model in primary_detector, is there any solution to improve its accuracy and add some sample data to one of the classes? (for example adding industrial cars to the car dataset)

I would appreciate it if you help me with this issue, I am new to this topic and I have no idea.
Thanks

Please share more details about how you trained your model.
Did you use default jupyter detectnet_v2 notebook to train?

Thanks,
Yes I used the default jupyter.
Example detectnect v2.

If possible, can you save the notebook as an html file and attach here?
More, how about the mAP in the last epoch?

Hi Morganh,

I uploaded it here: Gofile - Free file sharing and storage platform
Please let me know if you need more details.
Thanks

From your detailed html file, I can find

  1. Your pruned model can get 77% mAP in the end. Its inference result, see section 8, is good.
  2. In section 9, when you run tlt-export with GeForce GTX 1650, it meets OOM error.
    Also, in section 9-B, when generate trt engine, the OOM error occurs.
  3. In section 10, there is the same wrong inference result(bbox is not correct ) when you run tlt-infer.

Thus, please make sure you can run section 9 and 10 successfully.

Suggestion:

  1. When you run tlt-infer, try examples/specs/detectnet_v2_inference_kitti_tlt.txt to confirm your tlt model can get good inference result (section 10)
  2. For etlt model, please use below etlt when you run tlt-export (section 9) . Seems that it does not meet OOM error. Suggest you copy this etlt model(below bold font)into NX to deploy.

!tlt-export detectnet_v2
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
–cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor
–data_type int8
–batches 10
–batch_size 4
–max_batch_size 4
–engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8
–cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
–verbose

Thank you very much Morganh
I will test your suggestions and I will get back to you soon.
Could you please let me know how I can add some pictures ( a dataset of 10000 pics of the industrial car) to this model? Is resnet10 in primary detector folder more accurate than this resnet10? how can I add more dataset to that one?
Also, I would appreciate it if you let me know how I can use this model for deepstream python.

Many thanks

Morganh,

I have a question about part 3. RUN TLT traing,
As you can see in my records (detectnet_v2.html), I think something is not correct:
0 successful operations.
0 derived errors ignored.
I am a little confused about what should I do. Can you please let me know the correct solution for my issue?

If you have your own dataset, please resize the images/labels into the resolution you want to train, and then generate tfrecords via tlt-dataset-convert. (Reference: section 1-B, Prepare tf records from kitti format dataset)
What do you mean “resnet10 in primary detector folder” and “this resnet10”?

For your part3 tlt-train, I think previously you can run the training successfully, but meet error now.
The same error as Tlt detectnet training focusing on a particular class? - #16 by beefshepherd , please create a new result folder.

Hi Morgan,
I tried it again and the problem of oom error seems to be solved.
however, the output in section10 is not correct yet. I uploaded it here: Gofile - Free file sharing and storage platform
I have not tested it on jetson yet, but I think first the problem needs to be solved.
I would appreciate it if you help me.

According to latest result, see section 9-A, you already generate the etlt model successfully.
You can copy it into jetson device and run inference. This is the 1st option for deployment.

But from section 9-B, it is still OOM when you run tlt-converter in your host PC. So you still get the wrong result at section 10.
If you want to run inference in Jetson device, you can ignore this OOM in your host PC. You need to download tlt-converter(Jetson version) and run tlt-converter in your Jetson device.
Then deploy the generated trt engine. This is the 2nd option for deployment.

Thank you very much for your help Morgan.

I tested it on jetson, it works, however it is not as good as the default resnet10 on jetson. Maybe I need to change the pth for more accuracy or use a large data set for better training. Correct?

About my other question, I am not sure if I understood you correctly, I want to add more pics to vehicle images in resnet10 in …/samples/models/Primary_Detector folder (for now it is not trained for industrial cars and I want to add this feature).
Do I need to use the Kitti dataset with 7500 images and add my data set to this one? and then the same process?
Doesn’t it affect the accuracy and make it worse because of the low number of datasets? (only 7500 for training and 7500 for test). Is there any solution to have the same accuracy as the default resnet10 with my customized dataset?

Morgan,

I tried to create an engine file using tlt-converter as you described here:

but when I run the python file (I editted config file with the new engine file) I receive this error:
ERROR: failed to build network since there is no model file matched.
Please kindly advise me on this matter.

Hi @rdev20
For how to use tlt-converter to create an TRT engine, please refer to Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation,
then deploy the trt engine according to Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
I do not know what is the “python file” you mentioned.

More, for resnet10 inside …/samples/models/Primary_Detector, it is not related to TLT.
See more in Transfer Learning toolkit models vs Deepstream models on the Nano
Dataset used for training sample models
Training of resnet-10 using DIGITS for object detection - #3 by imbatraman