DLA trtexec questions

gabig1970 · January 9, 2020, 1:59pm

Hi,

I read about how to use the DLA in this page:

I am running a trtexec command line similar to what written in the page but with my onnx file
and i get the text below, can anyone explain please what does it mean and what to expect when running this net with DLA?

Thanks,
Gabi

./bin/trtexec --onnx=/mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx --output=prob --useDLACore=1 --fp16 --allowGPUFallback
&&&& RUNNING TensorRT.trtexec # ./bin/trtexec --onnx=/mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx --output=prob --useDLACore=1 --fp16 --allowGPUFallback
[I] onnx: /mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx
[I] output: prob
[I] useDLACore: 1
[I] fp16
[I] allowGPUFallback

Input filename: /mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.1
Domain:
Model version: 0
Doc string:

WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 174) [PluginV2] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 177) [PluginV2] is not running on DLA, falling back to GPU.
[W] [TRT] DLA LAYER: CBUF size requirement for layer (Unnamed Layer* 179) [Convolution] is 9banks, which exceeds the limit (8).
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 179) [Convolution] is not running on DLA, falling back to GPU.
[W] [TRT] DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer (Unnamed Layer* 180) [Activation]
[W] [TRT] DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer (Unnamed Layer* 276) [Activation]
[W] [TRT] DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer (Unnamed Layer* 278) [Activation]
[W] [TRT] DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer (Unnamed Layer* 279) [Activation]
[I] Average over 10 runs is 73.9307 ms (host walltime is 74.6987 ms, 99% percentile time is 76.5665).
[I] Average over 10 runs is 72.0191 ms (host walltime is 72.4281 ms, 99% percentile time is 76.1508).
[I] Average over 10 runs is 71.4415 ms (host walltime is 71.886 ms, 99% percentile time is 73.8089).
[I] Average over 10 runs is 70.9269 ms (host walltime is 71.1898 ms, 99% percentile time is 74.4588).
[I] Average over 10 runs is 71.2038 ms (host walltime is 71.4089 ms, 99% percentile time is 73.0836).
[I] Average over 10 runs is 69.9025 ms (host walltime is 70.118 ms, 99% percentile time is 72.2458).
[I] Average over 10 runs is 70.6782 ms (host walltime is 70.914 ms, 99% percentile time is 72.2416).
[I] Average over 10 runs is 69.9151 ms (host walltime is 70.18 ms, 99% percentile time is 71.1107).
[I] Average over 10 runs is 69.9766 ms (host walltime is 70.2225 ms, 99% percentile time is 70.7799).
[I] Average over 10 runs is 70.4809 ms (host walltime is 70.6907 ms, 99% percentile time is 72.1121).
&&&& PASSED TensorRT.trtexec # ./bin/trtexec --onnx=/mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx --output=prob --useDLACore=1 --fp16 --allowGPUFallback

AastaLLL · January 10, 2020, 2:54am

Hi,

You can add --verbose to see the detail deployment of the model.

Based on your log, this are non-supported layer, which is fallback to the GPU.

[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 174) [PluginV2] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 177) [PluginV2] is not running on DLA, falling back to GPU.
[W] [TRT] DLA LAYER: CBUF size requirement for layer (Unnamed Layer* 179) [Convolution] is 9banks, which exceeds the limit (8).
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 179) [Convolution] is not running on DLA, falling back to GPU.

And also some layers are fallback into GPU due the the full workload of DLA.
Thanks.

gabig1970 · January 14, 2020, 2:31pm

Hi,

I have some more questions,

When there is a fallback to the GPU, is there a way to measure the time spent on the DLA and the time spent on the GPU?
when using also the option --saveEngine,

./trtexec --onnx=/media/foresight/nvmedisk/Xavier/vis/retinanet_rn50fpn.onnx --useDLACore=0 --fp16 --allowGPUFallback --saveEngine=engine_file

In what format is the engine file created?

Is it the same format as the plan format created in retinanet-examples when running the command below?

./export model.onnx engine.plan

see: https://github.com/NVIDIA/retinanet-examples/tree/master/extras/deepstream

Thanks,
Gabi

AastaLLL · February 4, 2020, 6:46am

Hi,

1. We don’t have a tool for this.
However, TensorRT profiler do support layer-level execution time profiling.
So you can still check the performance of these fallbacked layers directly.

2. It is a serialized kernel data from TensorRT.
And yes, it is the same format of the link you shared.

But please noticed that this engine file is very sensitive to the TensorRT version and platform.
So it cannot be used cross different TensorRT version and platform.

Thanks.

Topic		Replies	Views
Trtexec log problem and use DLA error on Jetson Xavier Jetson AGX Xavier dla	7	1670	October 18, 2021
Trtexec failed to generate engine (Internal Error) with DLA Jetson Orin NX tensorrt , nvbugs , dla	7	1195	April 8, 2024
Engine creation fails when using DLA with GPU fallback Jetson AGX Xavier tensorrt , dla	11	2180	March 22, 2022
--useDLACore + --allowGPUFallback is significantly slower DRIVE AGX Xavier General driveos-dl	3	974	November 15, 2021
Cannot build a TensorRT engine for DLA because Constant_output_0 is not supported in DLA Jetson AGX Orin tensorrt , dla	8	358	July 23, 2024
Wrong result from DLA Jetson AGX Xavier nvbugs , dla	8	936	October 18, 2021
Cannot create DLA engine using trtexec Jetson Xavier NX tensorrt	2	1613	October 18, 2021
Trtexec log problem and use DLA error on Jetson Xavier TensorRT	3	928	February 19, 2021
Unable to use DLA with TensorRT Jetson AGX Xavier	11	3474	November 8, 2018
Layers not using DLA cores on NX DeepStream SDK dla	11	1501	October 12, 2021

DLA trtexec questions

Input filename: /mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx ONNX IR version: 0.0.4 Opset version: 9 Producer name: pytorch Producer version: 1.1 Domain: Model version: 0 Doc string:

Related topics

Input filename: /mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.1
Domain:
Model version: 0
Doc string: