Convert model to TensorRT with DLA | DLA Node compilation Failed

Chieh · August 11, 2021, 3:55am

Description

I was trying to apply DLA on TensorRT model that I encountered the issue of “DLA Node compilation Failed.”

The model can successfully be converted to TensorRT without using DLA via trtexec tool.
Here is my command.

/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --fp16 --explicitBatch --workspace=318

However, if I enalbe the DLA, it will show this error message. And I searched on online, most of solutions were solved by adjusting higher workspace but it still cannot work for my case.

/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --useDLACore=1 --fp16 --allowGPUFallback --explicitBatch --workspace=2048

I have tried many different values such as 300, 2048, 4096, etc…

Here is the error message:

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --useDLACore=1 --fp16 --allowGPUFallback --explicitBatch --workspace=2048
[08/11/2021-09:10:06] [I] === Model Options ===
[08/11/2021-09:10:06] [I] Format: ONNX
[08/11/2021-09:10:06] [I] Model: model.onnx
[08/11/2021-09:10:06] [I] Output:
[08/11/2021-09:10:06] [I] === Build Options ===
[08/11/2021-09:10:06] [I] Max batch: explicit
[08/11/2021-09:10:06] [I] Workspace: 2048 MB
[08/11/2021-09:10:06] [I] minTiming: 1
[08/11/2021-09:10:06] [I] avgTiming: 8
[08/11/2021-09:10:06] [I] Precision: FP32+FP16
[08/11/2021-09:10:06] [I] Calibration: 
[08/11/2021-09:10:06] [I] Safe mode: Disabled
[08/11/2021-09:10:06] [I] Save engine: model.trt
[08/11/2021-09:10:06] [I] Load engine: 
[08/11/2021-09:10:06] [I] Builder Cache: Enabled
[08/11/2021-09:10:06] [I] NVTX verbosity: 0
[08/11/2021-09:10:06] [I] Inputs format: fp32:CHW
[08/11/2021-09:10:06] [I] Outputs format: fp32:CHW
[08/11/2021-09:10:06] [I] Input build shapes: model
[08/11/2021-09:10:06] [I] Input calibration shapes: model
[08/11/2021-09:10:06] [I] === System Options ===
[08/11/2021-09:10:06] [I] Device: 0
[08/11/2021-09:10:06] [I] DLACore: 1(With GPU fallback)
[08/11/2021-09:10:06] [I] Plugins:
[08/11/2021-09:10:06] [I] === Inference Options ===
[08/11/2021-09:10:06] [I] Batch: Explicit
[08/11/2021-09:10:06] [I] Input inference shapes: model
[08/11/2021-09:10:06] [I] Iterations: 10
[08/11/2021-09:10:06] [I] Duration: 3s (+ 200ms warm up)
[08/11/2021-09:10:06] [I] Sleep time: 0ms
[08/11/2021-09:10:06] [I] Streams: 1
[08/11/2021-09:10:06] [I] ExposeDMA: Disabled
[08/11/2021-09:10:06] [I] Spin-wait: Disabled
[08/11/2021-09:10:06] [I] Multithreading: Disabled
[08/11/2021-09:10:06] [I] CUDA Graph: Disabled
[08/11/2021-09:10:06] [I] Skip inference: Disabled
[08/11/2021-09:10:06] [I] Inputs:
[08/11/2021-09:10:06] [I] === Reporting Options ===
[08/11/2021-09:10:06] [I] Verbose: Disabled
[08/11/2021-09:10:06] [I] Averages: 10 inferences
[08/11/2021-09:10:06] [I] Percentile: 99
[08/11/2021-09:10:06] [I] Dump output: Disabled
[08/11/2021-09:10:06] [I] Profile: Disabled
[08/11/2021-09:10:06] [I] Export timing to JSON file: 
[08/11/2021-09:10:06] [I] Export output to JSON file: 
[08/11/2021-09:10:06] [I] Export profile to JSON file: 
[08/11/2021-09:10:06] [I] 
----------------------------------------------------------------
Input filename:   model.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.5
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_134: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_134 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_142: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_142 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_146: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_146 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_158: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_158 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_162: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_162 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:08] [W] [TRT] ConvTranspose_166: DLA cores do not support more than 1 groups.
[08/11/2021-09:10:08] [W] [TRT] Default DLA is enabled but layer ConvTranspose_166 is not supported on DLA, falling back to GPU.
[08/11/2021-09:10:09] [I] [TRT] 
[08/11/2021-09:10:09] [I] [TRT] --------------- Layers running on DLA: 
[08/11/2021-09:10:09] [I] [TRT] {Conv_0,BatchNormalization_1,Relu_2,Conv_3,BatchNormalization_4,Relu_5,Conv_6,BatchNormalization_7,Relu_8,MaxPool_9,Conv_10,BatchNormalization_11,Conv_12,BatchNormalization_13,Relu_14,Conv_15,BatchNormalization_16,Add_17,Relu_18,Conv_19,BatchNormalization_20,Relu_21,Conv_22,BatchNormalization_23,Add_24,Relu_25,Concat_26,Conv_27,BatchNormalization_28,Relu_29,MaxPool_30,MaxPool_31,Conv_32,BatchNormalization_33,Conv_34,BatchNormalization_35,Relu_36,Conv_37,BatchNormalization_38,Add_39,Relu_40,Conv_41,BatchNormalization_42,Relu_43,Conv_44,BatchNormalization_45,Add_46,Relu_47,Concat_48,Conv_49,BatchNormalization_50,Relu_51,Conv_52,BatchNormalization_53,Relu_54,Conv_55,BatchNormalization_56,Add_57,Relu_58,Conv_59,BatchNormalization_60,Relu_61,Conv_62,BatchNormalization_63,Add_64, (skip it){Conv_136,BatchNormalization_137,Relu_138,Conv_143,BatchNormalization_144,Relu_145}, {Conv_148,BatchNormalization_149,Relu_150,Conv_159,BatchNormalization_160,Relu_161}, {Conv_168,BatchNormalization_169,Relu_170}, {Conv_152,BatchNormalization_153,Relu_154,Conv_163,BatchNormalization_164,Relu_165}, {Conv_172,BatchNormalization_173,Relu_174}, {Conv_176,BatchNormalization_177,Relu_178,Conv_179,Relu_180,Conv_181,Conv_182,Relu_183,Conv_184,Conv_185,Relu_186,Conv_187,Conv_188,Relu_189,Conv_190,Conv_191,Relu_192,Conv_193,Conv_194,Relu_195,Conv_196}, 
[08/11/2021-09:10:09] [I] [TRT] --------------- Layers running on GPU: 
[08/11/2021-09:10:09] [I] [TRT] ConvTranspose_134, ConvTranspose_142, ConvTranspose_158, 448 copy, 473 copy, 408 copy, 481 copy, 368 copy, 497 copy, ConvTranspose_146, ConvTranspose_162, 489 copy, 485 copy, 509 copy, 501 copy, ConvTranspose_166, 513 copy, 505 copy, 
[08/11/2021-09:10:11] [W] [TRT] DLA Node compilation Failed.
[08/11/2021-09:10:11] [W] [TRT] DLA Node compilation Failed.
[08/11/2021-09:10:11] [E] [TRT] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[08/11/2021-09:10:11] [E] [TRT] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node {Conv_0,BatchNormalization_1,Relu_2,Conv_3,BatchNormalization_4,Relu_5,Conv_6,BatchNormalization_7,Relu_8,MaxPool_9,Conv_10,BatchNormalization_11,Conv_12,BatchNormalization_13,Relu_14,Conv_15,BatchNormalization_16,Add_17,Relu_18,Conv_19,BatchNormalization_20,Relu_21,Conv_22, (skip it),Relu_119,Conv_120,BatchNormalization_121,Relu_122,Conv_123,BatchNormalization_124,Add_125,Relu_126,Concat_127,Conv_128,BatchNormalization_129,Relu_130,Conv_131,BatchNormalization_132,Relu_133,Conv_139,BatchNormalization_140,Relu_141,Conv_155,BatchNormalization_156,Relu_157}.)
[08/11/2021-09:10:11] [E] [TRT] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 ()
[08/11/2021-09:10:11] [E] Engine creation failed
[08/11/2021-09:10:11] [E] Engine set up failed

Onnx model size is ~74MB.

Environment

TensorRT Version: 7.1.3.4
Device was AGX
(host was installing jetpack : v4.4.1)
Docker image was nvcr.io/nvidia/l4t-ml:r32.6.1-py3 from NGC

BTW I was trying in the docker container env.

I wonder what the reason raised this issue? (Even this error cannot be fixed, I wanna find the reason for making this error. Does it cause by GPU memory? or something?

Thank you.

NVES · August 12, 2021, 5:35am

Hi,
Please check the below links, as they might answer your concerns.

Thanks!

Chieh · August 12, 2021, 9:13am

Thank you!

Topic		Replies	Views
Xavier NX does not support adaptative average pooling on DLA? Jetson Xavier NX tensorrt	27	1100	October 11, 2023
Jetson Orin: All layers pushed to GPU, zero layers on DLA Jetson AGX Orin tensorrt , dla	7	1023	April 26, 2023
DLA_STANDALONE error in forceToUseNvmIO Jetson AGX Xavier dla	15	1265	February 9, 2023
Run a part of DNN on DLA and part of DNN on GPU Jetson AGX Xavier dla	7	1148	February 14, 2023
Tensorrt Python API has a bug in DLA usage Jetson AGX Xavier tensorrt	11	626	August 17, 2022
[TensorRT] Running a simple onnx model on Jetson Xavier DLA Jetson Xavier NX tensorrt , onnx	12	2881	August 10, 2022
How can I customize matrix multiplication on DLA Jetson AGX Orin dla	12	133	September 25, 2024
Cannot build a TensorRT engine for DLA from a large ONNX file Jetson Xavier NX tensorrt , nvbugs , dla	12	2613	July 21, 2021
Unable to build tensorrt engine with DLA enabled on Jetson Xavier NX Jetson Xavier NX tensorrt , cudnn	7	287	May 15, 2024
[Xavier NX + DLA] does not support dynamic shapes, and CBUF size requirement Jetson Xavier NX tensorrt , nvbugs , dla	9	1790	October 18, 2021

Convert model to TensorRT with DLA | DLA Node compilation Failed

Description

Environment

Related topics