How to use tlt trained model on Jetson Nano

Hi, I trained an object-detection model in TLT following examples/detectnet_v2 ipynb. Train & prune & inference & export work well and I got .tlt/.etlt file. How to use these files on Jetson Nano without Deepstream? Is there a python solution likes in jetson-inference? Thanks.


For preprocess or postprocess, refer to Run PeopleNet with tensorrt

Hi Morganh, thanks for your reference. I migrate the .trt engine to SSD example, and after do inference there are two outputs with correct dims. But I don’t know how to translate the outputs to bbox or label or something else. Maybe I missed any document about detectnet_v2 or I should look forward source code of tlt-infer. Any suggestion?

Is there any document about the network structure of detectnet_v2? I have to extract bbox/labels from the outputs of tensorrt.

I am afraid GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream can help you.

2. Detectnet_v2

The model has the following two outputs:

  • output_cov/Sigmoid : A [batchSize, Class_Num, gridcell_h, gridcell_w] tensor contains the number of gridcells that are covered by an object
  • output_bbox/BiasAdd : a [batchSize, Class_Num, 4] contains the normalized image coordinates of the object (x1, y1) top left and (x2, y2) bottom right with respect to the grid cell


I am a bit confused as to how to use this.

Like the OG author, I followed the steps, and I was able to create fp16 version of trt from .etlt file on TX2 (thanks to you)

As a next step, Instead of using that trt file, I straight up ran the example you shared above - object-detection-tensorrt-example/ at master · NVIDIA/object-detection-tensorrt-example · GitHub

With this example, I am getting like 1.63 FPS (I used imutil package (via pip) to measure the FPS), but the output of the example said that the inference took 22ms or so for images.

As 1.63 FPS seems extremely slow, I wanted to check to see if there is normal or if I implemented something incorrectly.

Could it be that this was due to the fact that the SSD example uses SSD model rather than DetectNet? I wanted to test on DetectNet, but I couldn’t figure out what needs to be changed. Should I just update the model path within the file so it directs to the fp16 trt file I created following Transfer Learning Toolkit documentation?

Also, what’s weird is that when I run jetson-inference example (jetson-inference/ at master · dusty-nv/jetson-inference · GitHub), I saw that TX2 is capable of outputting 50-70 FPS. Not sure if this is the right place to direct my question, but I would appreciate it if you have any pointers.

Thanks you.

For how to test the inference time, you can run trtexec.
Reference: Measurement model speed