Python App Cutom Model on the Jetson Nano

Yuvalg1987 · September 23, 2020, 6:19pm

Hi,

I’m using TLT in order to train a custom detection model (based on detectNet) and deploy it on the Jetson Nano. So far, I managed to train the model using the notebook (created the etlt file) and converted it to an engine file on the Jetson using tlt-converter.

Looking at the python examples (feeding the engine files directly into DeepStream), I see that I need to provide nvinfer several files such as a .caffe and .prototext in addition to the engine file. How do I generate these files ?

For some cases, I would like to use TRT directly. So I converted the model (.etlt file) to a .trt file. How can I use this file outside of DeepStream in python? (the TRT python API doesn’t specify what to do with a .trt file).

While using tlt-converter I’m getting the “some tactics do not have …” warning, I know it can be solved by using the -w flag but not sure what is a good value for that. Any advise will help!

In addition, I would like to know whether a TRT generated model depends on the GPU model type or just the architecture. For example a TRT model generated using a 1070 GPU will work on a 1080 GPU? (It will obviously not work on a 2080 GPU)

Thanks
Yuval

Morganh · September 24, 2020, 2:50am

See https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#intg_detectnetv2_model, only etlt model , ngc_key, label file are needed.

tlt-encoded-model=xxx.etlt
tlt-model-key= yourkey

Note, if you already generated trt engine, above two lines is not needed. Just set a new line as below.

model-engine-file = xxx.engine

For how to use trt engine file outside of DeepStream in python, please refer to How to use tlt trained model on Jetson Nano - #3 by Morganh
For -w flag, in Nano board, refer to Accelerating Peoplnet with tlt for jetson nano - #13 by Morganh
It depends on TRT version and architecture. If TRT version is the same, TRT model generated using a 1070 GPU is expected to work on a 1080 GPU.
More info in https://developer.nvidia.com/cuda-gpus#compute and Support Matrix :: NVIDIA Deep Learning TensorRT Documentation

Yuvalg1987 · September 29, 2020, 8:12am

Thanks for the detailed response!

I’m experimenting with my trained model and trying to improve the runtime performance which is roughly 0.5 fps and is very jittery.

My setup and configuration is as follows:

DetectNet with ResNet18 backbone.
I pruned the model with pth=0.01, the ratio between the pruned and unpruned model is 0.05 without compromising the accuracy.
I exported the model using the following command (not using int8 as is not supported on the Jetson Nano)

!tlt-export detectnet_v2
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
–max_workspace_size 3073741824
–verbose

On the Jetson, I couldn’t resolve the “some tactics do not have …” warning even after increasing the memory size using -w and decreasing -m flag. What else can I try? How significant is this? As far as I can tell adding “-t fp16” is the cause (although I would like to keep the network at fp16).

I modified the python-app example #1 to match my current network but couldn’t test the performance of different batch sizes. So far I managed to run only a batch of 1. Any advice how to make it work? (I changed the batch variable in the config file on the Jetson and in the python script as well).

in addition, I’m testing the performence of Detectnet with a resnet10 backbone and will update on the performence Im getting.

Thanks
Yuval

Morganh · September 30, 2020, 3:04am

Please do not care about “some tactics do not have …”, it is not a harmful log.
As long as you can get an etlt model file after you run “tlt-export”, it is OK.

What is “python-app example #1”?

Yuvalg1987 · September 30, 2020, 6:49am

The python example is found in the following link:

In addition, I’m having trouble deploying detectnet based on a resnet10 backbone to the Jetson. I’m following the same procedure as with resnet18 but changed to 10 in all the necessary places (in the code and in the spec files). I managed to get the .etlt file but getting an error while converting it on the Nano. The error I’m getting is with UFFParser: Unsupported number of graph 0 . I’ve read that it is related to the key but honestly, I can’t find anything wrong with it (equivalent on both platforms). How can I debug this?

Thanks
Yuval

Morganh · September 30, 2020, 7:36am

Please refer to TLT Converter UffParser: Unsupported number of graph 0 - #4 by Morganh

Yuvalg1987 · September 30, 2020, 8:41am

Ok, I did have a small typo in the file naming (now the conversion works).

I’m left with the following issues:

Should I expect performance improvement by using fp16 on the Nano? This option should be configured only while using the tlt-converter or in earlier stages as well (export stage)?
How to correctly apply batch mode to the python example in the following link? so far I managed to use it with batch=1 but it crashes with larger batch sizes.
deepstream_python_apps/apps/deepstream-test1 at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub
Are there any more steps I can take in order to maximize the performence?

Thanks
Yuval

Morganh · September 30, 2020, 9:12am

I do not understand your comment “performance improvement”. The tool tlt-converter is just in order to generate trt engine.
From tlt user guide, it just verify GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream. Not sure the status of your mentioned link https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-test1
For performance in Nano, please make sure

$ nvpmodel -m 0

$  jetson_clocks

Yuvalg1987 · September 30, 2020, 12:00pm

I meant run-time execution; I would expect that a model converted to fp16 will be faster than a fp32. As far as I understand, the -t flag in the converter should affect the run-time execution of the model. Is that right?
I’ll try and run the model with the original DS App. The python examples are not valid example for using DS?
I believe I already tried setting jetson_clocks but I’ll double check.

As I’m following the KITTI tutorial at this stage, is there any benchmark for run-time performance on the NANO? How much FPS should I achieve?

Thanks
Yuval

Morganh · October 1, 2020, 4:18pm

The “-t” just means “engine datatype”. If you set “-t fp16”, then a fp16 trt engine is generated. For inference time, please use trtexec to test. Reference: Measurement model speed
The python examples should be valid example for using DS. But for TLT model(etlt model or its output trt engine), not sure its status inside the DS python examples.
For KITTI, it does not have. But you can find FPS in https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet, https://ngc.nvidia.com/catalog/models/nvidia:tlt_facedetectir, etc. See Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation

Topic		Replies	Views
How to export model using tlt-converter for Jetson Nano TAO Toolkit	69	8483	October 12, 2021
tlt-converter - UFF parser Error TAO Toolkit	15	3912	October 12, 2021
transfert learning toolkit-> export model TAO Toolkit	11	3583	October 12, 2021
Nvidia TLT TAO Toolkit	15	1612	October 12, 2021
Tlt-convert on jetson nano TAO Toolkit	6	1849	October 12, 2021
Integrating Tao Models (detectnet_v2) into Deepstream SDK TAO Toolkit tao , deepstream , jetson-nano	11	976	March 24, 2023
Cannot Create Mobilenet SSD TRT Engine on Jetson Nano \| [ERROR] UffParser: Unsupported number of graph 0 TAO Toolkit tensorrt	9	620	October 12, 2021
How to use tlt trained model on Jetson Nano TAO Toolkit tensorrt , jetson-inference	7	2089	October 12, 2021
TAO and Jetson-Inference ...ooops TAO Toolkit jetson-inference	9	1077	February 20, 2023
TLT Converter UffParser: Unsupported number of graph 0 TAO Toolkit	4	2494	October 12, 2021

Python App Cutom Model on the Jetson Nano

Related topics