I’m using TLT in order to train a custom detection model (based on detectNet) and deploy it on the Jetson Nano. So far, I managed to train the model using the notebook (created the etlt file) and converted it to an engine file on the Jetson using tlt-converter.
Looking at the python examples (feeding the engine files directly into DeepStream), I see that I need to provide nvinfer several files such as a .caffe and .prototext in addition to the engine file. How do I generate these files ?
For some cases, I would like to use TRT directly. So I converted the model (.etlt file) to a .trt file. How can I use this file outside of DeepStream in python? (the TRT python API doesn’t specify what to do with a .trt file).
While using tlt-converter I’m getting the “some tactics do not have …” warning, I know it can be solved by using the -w flag but not sure what is a good value for that. Any advise will help!
In addition, I would like to know whether a TRT generated model depends on the GPU model type or just the architecture. For example a TRT model generated using a 1070 GPU will work on a 1080 GPU? (It will obviously not work on a 2080 GPU)
On the Jetson, I couldn’t resolve the “some tactics do not have …” warning even after increasing the memory size using -w and decreasing -m flag. What else can I try? How significant is this? As far as I can tell adding “-t fp16” is the cause (although I would like to keep the network at fp16).
I modified the python-app example #1 to match my current network but couldn’t test the performance of different batch sizes. So far I managed to run only a batch of 1. Any advice how to make it work? (I changed the batch variable in the config file on the Jetson and in the python script as well).
in addition, I’m testing the performence of Detectnet with a resnet10 backbone and will update on the performence Im getting.
Please do not care about “some tactics do not have …”, it is not a harmful log.
As long as you can get an etlt model file after you run “tlt-export”, it is OK.
The python example is found in the following link:
In addition, I’m having trouble deploying detectnet based on a resnet10 backbone to the Jetson. I’m following the same procedure as with resnet18 but changed to 10 in all the necessary places (in the code and in the spec files). I managed to get the .etlt file but getting an error while converting it on the Nano. The error I’m getting is with UFFParser: Unsupported number of graph0 . I’ve read that it is related to the key but honestly, I can’t find anything wrong with it (equivalent on both platforms). How can I debug this?
Ok, I did have a small typo in the file naming (now the conversion works).
I’m left with the following issues:
Should I expect performance improvement by using fp16 on the Nano? This option should be configured only while using the tlt-converter or in earlier stages as well (export stage)?
I meant run-time execution; I would expect that a model converted to fp16 will be faster than a fp32. As far as I understand, the -t flag in the converter should affect the run-time execution of the model. Is that right?
I’ll try and run the model with the original DS App. The python examples are not valid example for using DS?
I believe I already tried setting jetson_clocks but I’ll double check.
As I’m following the KITTI tutorial at this stage, is there any benchmark for run-time performance on the NANO? How much FPS should I achieve?
The “-t” just means “engine datatype”. If you set “-t fp16”, then a fp16 trt engine is generated. For inference time, please use trtexec to test. Reference: Measurement model speed
The python examples should be valid example for using DS. But for TLT model(etlt model or its output trt engine), not sure its status inside the DS python examples.