Convert model to TensorRT with DLA | DLA Node compilation Failed

Description

I was trying to apply DLA on TensorRT model that I encountered the issue of “DLA Node compilation Failed.”

The model can successfully be converted to TensorRT without using DLA via trtexec tool.
Here is my command.

/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --fp16 --explicitBatch --workspace=318

However, if I enalbe the DLA, it will show this error message. And I searched on online, most of solutions were solved by adjusting higher workspace but it still cannot work for my case.

/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --useDLACore=1 --fp16 --allowGPUFallback --explicitBatch --workspace=2048

I have tried many different values such as 300, 2048, 4096, etc…

Here is the error message:

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --useDLACore=1 --fp16 --allowGPUFallback --explicitBatch --workspace=2048
[08/11/2021-09:10:06] [I] === Model Options ===
[08/11/2021-09:10:06] [I] Format: ONNX
[08/11/2021-09:10:06] [I] Model: model.onnx
[08/11/2021-09:10:06] [I] Output:
[08/11/2021-09:10:06] [I] === Build Options ===
[08/11/2021-09:10:06] [I] Max batch: explicit
[08/11/2021-09:10:06] [I] Workspace: 2048 MB
[08/11/2021-09:10:06] [I] minTiming: 1
[08/11/2021-09:10:06] [I] avgTiming: 8
[08/11/2021-09:10:06] [I] Precision: FP32+FP16
[08/11/2021-09:10:06] [I] Calibration: 
[08/11/2021-09:10:06] [I] Safe mode: Disabled
[08/11/2021-09:10:06] [I] Save engine: model.trt
[08/11/2021-09:10:06] [I] Load engine: 
[08/11/2021-09:10:06] [I] Builder Cache: Enabled
[08/11/2021-09:10:06] [I] NVTX verbosity: 0
[08/11/2021-09:10:06] [I] Inputs format: fp32:CHW
[08/11/2021-09:10:06] [I] Outputs format: fp32:CHW
[08/11/2021-09:10:06] [I] Input build shapes: model
[08/11/2021-09:10:06] [I] Input calibration shapes: model
[08/11/2021-09:10:06] [I] === System Options ===
[08/11/2021-09:10:06] [I] Device: 0
[08/11/2021-09:10:06] [I] DLACore: 1(With GPU fallback)
[08/11/2021-09:10:06] [I] Plugins:
[08/11/2021-09:10:06] [I] === Inference Options ===
[08/11/2021-09:10:06] [I] Batch: Explicit
[08/11/2021-09:10:06] [I] Input inference shapes: model
[08/11/2021-09:10:06] [I] Iterations: 10
[08/11/2021-09:10:06] [I] Duration: 3s (+ 200ms warm up)
[08/11/2021-09:10:06] [I] Sleep time: 0ms
[08/11/2021-09:10:06] [I] Streams: 1
[08/11/2021-09:10:06] [I] ExposeDMA: Disabled
[08/11/2021-09:10:06] [I] Spin-wait: Disabled
[08/11/2021-09:10:06] [I] Multithreading: Disabled
[08/11/2021-09:10:06] [I] CUDA Graph: Disabled
[08/11/2021-09:10:06] [I] Skip inference: Disabled
[08/11/2021-09:10:06] [I] Inputs:
[08/11/2021-09:10:06] [I] === Reporting Options ===
[08/11/2021-09:10:06] [I] Verbose: Disabled
[08/11/2021-09:10:06] [I] Averages: 10 inferences
[08/11/2021-09:10:06] [I] Percentile: 99
[08/11/2021-09:10:06] [I] Dump output: Disabled
[08/11/2021-09:10:06] [I] Profile: Disabled
[08/11/2021-09:10:06] [I] Export timing to JSON file: 
[08/11/2021-09:10:06] [I] Export output to JSON file: 
[08/11/2021-09:10:06] [I] Export profile to JSON file: 
[08/11/2021-09:10:06] [I] 
----------------------------------------------------------------
Input filename:   model.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.5
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_134: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_134 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_142: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_142 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_146: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_146 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_158: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_158 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_162: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_162 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_166: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_166 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:09] [I] [TRT] 
[08/11/2021-09:10:09] [I] [TRT] --------------- Layers running on DLA: 
[08/11/2021-09:10:09] [I] [TRT] {Conv_0,BatchNormalization_1,Relu_2,Conv_3,BatchNormalization_4,Relu_5,Conv_6,BatchNormalization_7,Relu_8,MaxPool_9,Conv_10,BatchNormalization_11,Conv_12,BatchNormalization_13,Relu_14,Conv_15,BatchNormalization_16,Add_17,Relu_18,Conv_19,BatchNormalization_20,Relu_21,Conv_22,BatchNormalization_23,Add_24,Relu_25,Concat_26,Conv_27,BatchNormalization_28,Relu_29,MaxPool_30,MaxPool_31,Conv_32,BatchNormalization_33,Conv_34,BatchNormalization_35,Relu_36,Conv_37,BatchNormalization_38,Add_39,Relu_40,Conv_41,BatchNormalization_42,Relu_43,Conv_44,BatchNormalization_45,Add_46,Relu_47,Concat_48,Conv_49,BatchNormalization_50,Relu_51,Conv_52,BatchNormalization_53,Relu_54,Conv_55,BatchNormalization_56,Add_57,Relu_58,Conv_59,BatchNormalization_60,Relu_61,Conv_62,BatchNormalization_63,Add_64, (skip it){Conv_136,BatchNormalization_137,Relu_138,Conv_143,BatchNormalization_144,Relu_145}, {Conv_148,BatchNormalization_149,Relu_150,Conv_159,BatchNormalization_160,Relu_161}, {Conv_168,BatchNormalization_169,Relu_170}, {Conv_152,BatchNormalization_153,Relu_154,Conv_163,BatchNormalization_164,Relu_165}, {Conv_172,BatchNormalization_173,Relu_174}, {Conv_176,BatchNormalization_177,Relu_178,Conv_179,Relu_180,Conv_181,Conv_182,Relu_183,Conv_184,Conv_185,Relu_186,Conv_187,Conv_188,Relu_189,Conv_190,Conv_191,Relu_192,Conv_193,Conv_194,Relu_195,Conv_196}, 
[08/11/2021-09:10:09] [I] [TRT] --------------- Layers running on GPU: 
[08/11/2021-09:10:09] [I] [TRT] ConvTranspose_134, ConvTranspose_142, ConvTranspose_158, 448 copy, 473 copy, 408 copy, 481 copy, 368 copy, 497 copy, ConvTranspose_146, ConvTranspose_162, 489 copy, 485 copy, 509 copy, 501 copy, ConvTranspose_166, 513 copy, 505 copy, 
[08/11/2021-09:10:11] [W] [TRT] DLA Node compilation Failed.
[08/11/2021-09:10:11] [W] [TRT] DLA Node compilation Failed.
[08/11/2021-09:10:11] [E] [TRT] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[08/11/2021-09:10:11] [E] [TRT] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node {Conv_0,BatchNormalization_1,Relu_2,Conv_3,BatchNormalization_4,Relu_5,Conv_6,BatchNormalization_7,Relu_8,MaxPool_9,Conv_10,BatchNormalization_11,Conv_12,BatchNormalization_13,Relu_14,Conv_15,BatchNormalization_16,Add_17,Relu_18,Conv_19,BatchNormalization_20,Relu_21,Conv_22, (skip it),Relu_119,Conv_120,BatchNormalization_121,Relu_122,Conv_123,BatchNormalization_124,Add_125,Relu_126,Concat_127,Conv_128,BatchNormalization_129,Relu_130,Conv_131,BatchNormalization_132,Relu_133,Conv_139,BatchNormalization_140,Relu_141,Conv_155,BatchNormalization_156,Relu_157}.)
[08/11/2021-09:10:11] [E] [TRT] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 ()
[08/11/2021-09:10:11] [E] Engine creation failed
[08/11/2021-09:10:11] [E] Engine set up failed

Onnx model size is ~74MB.

Environment

TensorRT Version: 7.1.3.4
Device was AGX
(host was installing jetpack : v4.4.1)
Docker image was nvcr.io/nvidia/l4t-ml:r32.6.1-py3 from NGC

BTW I was trying in the docker container env.

I wonder what the reason raised this issue? (Even this error cannot be fixed, I wanna find the reason for making this error. Does it cause by GPU memory? or something?

Thank you.

Hi,
Please check the below links, as they might answer your concerns.
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_topic
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_layers
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#restrictions-with-dla
Thanks!

Thank you!