Cannot run model exported from TLT on Jetson's DLA

Description

I am using TLT2.0 (docker image nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3) to do transfer learning with detectnet_v2 resnet18. More precisely I am following this tutorial https://github.com/NVIDIA-AI-IOT/face-mask-detection. After training I am exporting the model with float16 precision as an .etl file.

Now I want to run the model with tensorRT on a Jetson Xavier AXG on the DLA. For that I am using the tlt-converter to generate the .engine/.trt file. Because I have tensorrt 6.0 I am using this converter https://developer.nvidia.com/tlt-converter-trt60. After that I am using trtexec to try to make inference on the DLA. Sadly the model only appears to run on the GPU.

Environment

TensorRT Version: 6.0
GPU Type: Xavier AGX
Operating System + Version: Jeptack 4.3

Steps To Reproduce

  • Exported the trained model with:
tlt-export detectnet_v2 \
            -o resnet18_detector.etl \
            -m resnet18_detector.tlt \
            -k key \
            --data_type fp16 
  • Then on the Jetson, converted the .etl model to a tensorrt engine with:
tlt-converter -k key \
-d "3,544,960"  \
-o "output_cov/Sigmoid,output_bbox/BiasAdd"  \
-e resnet18_detector.trt   \
 -m 16   \
 -t fp16   \
resnet18_detector.etl

But I got some messages that all operations run on GPU. I got this:

[INFO] 
[INFO] --------------- Layers running on DLA: 
[INFO] 
[INFO] --------------- Layers running on GPU: 
[INFO] conv1/convolution + activation_1/Relu, block_1a_conv_1/convolution + block_1a_relu_1/Relu, block_1a_conv_shortcut/convolution, block_1a_conv_2/convolution + add_1/add + block_1a_relu/Relu, block_1b_conv_1/convolution + block_1b_relu_1/Relu, block_1b_conv_2/convolution + add_2/add + block_1b_relu/Relu, block_2a_conv_1/convolution + block_2a_relu_1/Relu, block_2a_conv_shortcut/convolution, block_2a_conv_2/convolution + add_3/add + block_2a_relu/Relu, block_2b_conv_1/convolution + block_2b_relu_1/Relu, block_2b_conv_2/convolution + add_4/add + block_2b_relu/Relu, block_3a_conv_1/convolution + block_3a_relu_1/Relu, block_3a_conv_shortcut/convolution, block_3a_conv_2/convolution + add_5/add + block_3a_relu/Relu, block_3b_conv_1/convolution + block_3b_relu_1/Relu, block_3b_conv_2/convolution + add_6/add + block_3b_relu/Relu, block_4a_conv_1/convolution + block_4a_relu_1/Relu, block_4a_conv_shortcut/convolution, block_4a_conv_2/convolution + add_7/add + block_4a_relu/Relu, block_4b_conv_1/convolution + block_4b_relu_1/Relu, block_4b_conv_2/convolution + add_8/add + block_4b_relu/Relu, output_bbox/convolution, output_cov/convolution, output_cov/Sigmoid, 
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
  • Finally I tried to run it on the DLA:
trtexec --loadEngine=resnet18_detector.trt --batch=1 --useDLACore=0 --fp16 --verbose

But it appears to be using the GPU (checked with jtop GPU consumption). Also because when run without the --useDLACore I got the exact same inference time.

The above mentioned tutorial showed that it was possible to run it in DLA. In which part am I messing it up and how can I make it run on the DLA?

Moving this topic into TLT forum.

I am afraid your tlt-converter does not support DLA. You can run “tlt-converter -h” to check.
Please download DLA version https://developer.nvidia.com/assets/TLT/Secure/tlt-converter-7.1-dla.zip and retry.

Or you can deploy your etlt model with Deepstream directly and make sure enable DLA in the config file.

If you want to let DLA run inference, need to set below in the DS config file.
enable-dla = 1
use-dla-core = 1

1 Like

I struggle a lot finding links to different versions of that tool. Is there any page where I can find different tlt-converter versions? Moreover, is there a tlt-converter 6 with dla? Because I have in my machine tensorrt6 so my guess is that the version you provided is not compatible.

Instead, suggest you to deploy your etlt model with Deepstream directly and make sure enable DLA in the config file.

Deepstream is not necessary, I already have the code for deployment directly with tensorrt integrated with the rest of the software stack. I am just needing to generate a valid tensorrt engine with DLA support. Would appreciate a lot if you can answer my previous questions about tlt-converter links and versions :D

Hey, after running with deepstream and DLA enabled, the trt engine will be generated. That is what you expected.
For tlt-converter dla version for trt6, I will check further. I am afraid it is not available.