Why some convolutional layers cannot be compiled on DLA under int8?

1605997590 · May 31, 2024, 9:02am

Hi!
I am trying to convert an ONNX model to a TensorRT (TRT) file with DLA (Deep Learning Accelerator) enabled. I used the following command:
“trtexec --onnx=output.onnx --useDLACore=1 --int8 --dumpLayerInfo --exportLayerInfo=build_layer_info.log --allowGPUFallback”
However, I encountered some issues. Specifically, I received messages indicating that certain layers could not be compiled by DLA and were falling back to the GPU. The message I saw was:
“{ForeignNode[/neck/fpn_convs.0/fpn_convs.0.0/conv/Conv…/seg_result/resize]} cannot be compiled by DLA, falling back to GPU.”
The convolutional layers are shown in the picture. These convolutional layers meet the restrictions for DLA layers, so I am unsure why they cannot be compiled on DLA.

I also tried adding the --memPoolSize=dlaSRAM:0.5 option, but it still did not work. These convolutional layers should satisfy the restrictions of DLA layers, but they are not being compiled on DLA. What could be the reason for this?

AastaLLL · June 3, 2024, 2:34am

Hi,

The error shows the resize layer cannot use DLA rather than convolution.

Thanks.

alicekim · June 5, 2024, 8:00am

test

Thanks.
[/quote]

1605997590 · June 5, 2024, 10:55am

Hi!
I think this error means from /neck/fpn_convs.0/fpn_convs.0.0/conv/Conv to /seg_result/resize cannot work on DLA
the details are shown in the next:
[05/31/2024-03:09:20] [TRT] [W] {ForeignNode[/neck/fpn_convs.0/fpn_convs.0.0/conv/Conv…/seg_result/resize]} cannot be compiled by DLA, falling back to GPU.

[05/31/2024-03:09:20] [TRT] [V] DLA Memory Consumption Summary:

[05/31/2024-03:09:20] [TRT] [V] Number of DLA node candidates offloaded : 0 out of 1

[05/31/2024-03:09:20] [TRT] [V] Total memory required by accepted candidates : Managed SRAM = 0 MiB, Local DRAM = 0 MiB, Global DRAM = 0 MiB

[05/31/2024-03:09:20] [TRT] [V] After DLA optimization: 8 layers

[05/31/2024-03:09:20] [TRT] [V] Applying ScaleNodes fusions.

[05/31/2024-03:09:20] [TRT] [V] After scale fusion: 8 layers

[05/31/2024-03:09:20] [TRT] [V] Running: ConvReluFusion on /convs/convs.0/conv/Conv

[05/31/2024-03:09:20] [TRT] [V] ConvReluFusion: Fusing /convs/convs.0/conv/Conv with /convs/convs.0/activate/Relu

[05/31/2024-03:09:20] [TRT] [V] After dupe layer removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After final dead-layer removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After tensor merging: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After vertical fusions: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After dupe layer removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After final dead-layer removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After tensor merging: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After slice removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After concat removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] Trying to split Reshape and strided tensor

[05/31/2024-03:09:20] [TRT] [V] Graph construction and optimization completed in 0.1482 seconds.

[05/31/2024-03:09:20] [TRT] [I] ---------- Layers Running on DLA ----------

[05/31/2024-03:09:20] [TRT] [I] ---------- Layers Running on GPU ----------

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.0/conv/Conv

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.1/conv/Conv

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] CONVOLUTION: /convs/convs.0/conv/Conv + /convs/convs.0/activate/Relu

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] CONVOLUTION: /conv_seg/Conv

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] RESIZE: /seg_result/resize

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] TOPK: /seg_result/argmax

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] SHUFFLE: (Unnamed Layer* 7) [Shuffle]

Furthermore, I tried to remove the resize layer,and model is like this:

I found that these convolutional layers can work on DLA in fp16 mode:
[06/05/2024-03:41:38] [TRT] [V] {ForeignNode[/neck/fpn_convs.0/fpn_convs.0.1/conv/Conv…/conv_seg/Conv]} successfully offloaded to DLA.

[06/05/2024-03:41:38] [TRT] [V] Memory consumption details:

[06/05/2024-03:41:38] [TRT] [V] Pool Sizes: Managed SRAM = 0.5 MiB, Local DRAM = 1024 MiB, Global DRAM = 512 MiB

[06/05/2024-03:41:38] [TRT] [V] Required: Managed SRAM = 0.5 MiB, Local DRAM = 128 MiB, Global DRAM = 4 MiB

[06/05/2024-03:41:38] [TRT] [V] DLA Memory Consumption Summary:

[06/05/2024-03:41:38] [TRT] [V] Number of DLA node candidates offloaded : 1 out of 1

[06/05/2024-03:41:38] [TRT] [V] Total memory required by accepted candidates : Managed SRAM = 0.5 MiB, Local DRAM = 128 MiB, Global DRAM = 4 MiB

[06/05/2024-03:41:38] [TRT] [V] After DLA optimization: 2 layers

[06/05/2024-03:41:38] [TRT] [V] Applying ScaleNodes fusions.

[06/05/2024-03:41:38] [TRT] [V] After scale fusion: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After dupe layer removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After final dead-layer removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After tensor merging: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After vertical fusions: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After dupe layer removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After final dead-layer removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After tensor merging: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After slice removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After concat removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] Trying to split Reshape and strided tensor

[06/05/2024-03:41:38] [TRT] [V] Graph construction and optimization completed in 0.899575 seconds.

[06/05/2024-03:41:38] [TRT] [I] ---------- Layers Running on DLA ----------

[06/05/2024-03:41:38] [TRT] [I] [DlaLayer] {ForeignNode[/neck/fpn_convs.0/fpn_convs.0.1/conv/Conv…/conv_seg/Conv]}

[06/05/2024-03:41:38] [TRT] [I] ---------- Layers Running on GPU ----------

[06/05/2024-03:41:38] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.0/conv/Conv

However, they cannot work on DLA in int8 mode:
[06/05/2024-03:42:30] [TRT] [W] {ForeignNode[/neck/fpn_convs.0/fpn_convs.0.0/conv/Conv…/conv_seg/Conv]} cannot be compiled by DLA, falling back to GPU.

[06/05/2024-03:42:30] [TRT] [V] DLA Memory Consumption Summary:

[06/05/2024-03:42:30] [TRT] [V] Number of DLA node candidates offloaded : 0 out of 1

[06/05/2024-03:42:30] [TRT] [V] Total memory required by accepted candidates : Managed SRAM = 0 MiB, Local DRAM = 0 MiB, Global DRAM = 0 MiB

[06/05/2024-03:42:30] [TRT] [V] After DLA optimization: 5 layers

[06/05/2024-03:42:30] [TRT] [V] Applying ScaleNodes fusions.

[06/05/2024-03:42:30] [TRT] [V] After scale fusion: 5 layers

[06/05/2024-03:42:30] [TRT] [V] Running: ConvReluFusion on /convs/convs.0/conv/Conv

[06/05/2024-03:42:30] [TRT] [V] ConvReluFusion: Fusing /convs/convs.0/conv/Conv with /convs/convs.0/activate/Relu

[06/05/2024-03:42:30] [TRT] [V] After dupe layer removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After final dead-layer removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After tensor merging: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After vertical fusions: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After dupe layer removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After final dead-layer removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After tensor merging: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After slice removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After concat removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] Trying to split Reshape and strided tensor

[06/05/2024-03:42:30] [TRT] [V] Graph construction and optimization completed in 0.129133 seconds.

[06/05/2024-03:42:30] [TRT] [I] ---------- Layers Running on DLA ----------

[06/05/2024-03:42:30] [TRT] [I] ---------- Layers Running on GPU ----------

[06/05/2024-03:42:30] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.0/conv/Conv

[06/05/2024-03:42:30] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.1/conv/Conv

[06/05/2024-03:42:30] [TRT] [I] [GpuLayer] CONVOLUTION: /convs/convs.0/conv/Conv + /convs/convs.0/activate/Relu

[06/05/2024-03:42:30] [TRT] [I] [GpuLayer] CONVOLUTION: /conv_seg/Conv

I want to know why this is happening

AastaLLL · June 7, 2024, 4:10am

Hi,

Are you able to share the complete TensorRT conversion log with us?

Thanks.

1605997590 · June 12, 2024, 2:33am

log_no_ocrhead_no_resize_int8_debug.log (467.8 KB)
This is the log of int8 mode
log_no_ocrhead_no_resize_fp16_debug.log (64.0 KB)
This is the log of fp16 mode

AastaLLL · June 25, 2024, 7:31am

Hi,

The two logs are truncated without finishing or error.

Are these all you get when converting the TensorRT engine?
Could you double-check it?

Thanks.

1605997590 · June 25, 2024, 9:25am

我确定这些都是完整的log，在最后显示finish了

AastaLLL · June 26, 2024, 9:34am

Hi,

Sorry for the missing.

Based on your log, we do see a similar issue as you reported.
Could you share the ONNX model with us as well?

Thanks.

1605997590 · June 28, 2024, 8:54am

here is the onnx model

AastaLLL · July 1, 2024, 7:43am

Hi,

Could you share the ONNX file with us instead?

Thanks.

1605997590 · July 1, 2024, 10:35am

iter_240000_1224x1024_simplified.surgeon.naive.debug.debug_output.zip (9.8 KB)
This is the orin file

AastaLLL · July 18, 2024, 6:32am

Thanks for the file.
We will give it a check and update.

AastaLLL · July 23, 2024, 8:24am

Hi,

Could you double-check if you share the same ONNX file as the log provided above?

The file contains only one Conv layer and cannot run on DLA for both fp16 and int precision.
This is different from the log you shared.

Thanks.

system · August 14, 2024, 6:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Avoiding the creation of unnamed layers Jetson AGX Xavier tensorrt , dla , onnx	9	48	December 19, 2024
[Xavier NX + DLA] does not support dynamic shapes, and CBUF size requirement Jetson Xavier NX tensorrt , nvbugs , dla	9	1794	October 18, 2021
Convert model to TensorRT with DLA \| DLA Node compilation Failed TensorRT	3	910	October 12, 2021
Fail at runing conv layer on DLA Jetson AGX Orin dla	13	1193	November 9, 2022
Xavier NX does not support adaptative average pooling on DLA? Jetson Xavier NX tensorrt	27	1105	October 11, 2023
How can I customize matrix multiplication on DLA Jetson AGX Orin dla	12	150	September 25, 2024
Jetson Orin: All layers pushed to GPU, zero layers on DLA Jetson AGX Orin tensorrt , dla	7	1025	April 26, 2023
FP16 builder does not work, DLA does not accept anything, How to accelerate Deep Learning? Jetson AGX Xavier tensorrt	7	1184	February 9, 2022
DLA_STANDALONE error in forceToUseNvmIO Jetson AGX Xavier dla	15	1266	February 9, 2023
TensorRT run ONNX model with Int8 issue TensorRT	9	4136	October 12, 2021

Why some convolutional layers cannot be compiled on DLA under int8?

Related topics