Why some convolutional layers cannot be compiled on DLA under int8?

Hi!
I am trying to convert an ONNX model to a TensorRT (TRT) file with DLA (Deep Learning Accelerator) enabled. I used the following command:
“trtexec --onnx=output.onnx --useDLACore=1 --int8 --dumpLayerInfo --exportLayerInfo=build_layer_info.log --allowGPUFallback”
However, I encountered some issues. Specifically, I received messages indicating that certain layers could not be compiled by DLA and were falling back to the GPU. The message I saw was:
“{ForeignNode[/neck/fpn_convs.0/fpn_convs.0.0/conv/Conv…/seg_result/resize]} cannot be compiled by DLA, falling back to GPU.”
The convolutional layers are shown in the picture. These convolutional layers meet the restrictions for DLA layers, so I am unsure why they cannot be compiled on DLA.

I also tried adding the --memPoolSize=dlaSRAM:0.5 option, but it still did not work. These convolutional layers should satisfy the restrictions of DLA layers, but they are not being compiled on DLA. What could be the reason for this?

Hi,

The error shows the resize layer cannot use DLA rather than convolution.

Thanks.

test

Thanks.
[/quote]

Hi!
I think this error means from /neck/fpn_convs.0/fpn_convs.0.0/conv/Conv to /seg_result/resize cannot work on DLA
the details are shown in the next:
[05/31/2024-03:09:20] [TRT] [W] {ForeignNode[/neck/fpn_convs.0/fpn_convs.0.0/conv/Conv…/seg_result/resize]} cannot be compiled by DLA, falling back to GPU.

[05/31/2024-03:09:20] [TRT] [V] DLA Memory Consumption Summary:

[05/31/2024-03:09:20] [TRT] [V] Number of DLA node candidates offloaded : 0 out of 1

[05/31/2024-03:09:20] [TRT] [V] Total memory required by accepted candidates : Managed SRAM = 0 MiB, Local DRAM = 0 MiB, Global DRAM = 0 MiB

[05/31/2024-03:09:20] [TRT] [V] After DLA optimization: 8 layers

[05/31/2024-03:09:20] [TRT] [V] Applying ScaleNodes fusions.

[05/31/2024-03:09:20] [TRT] [V] After scale fusion: 8 layers

[05/31/2024-03:09:20] [TRT] [V] Running: ConvReluFusion on /convs/convs.0/conv/Conv

[05/31/2024-03:09:20] [TRT] [V] ConvReluFusion: Fusing /convs/convs.0/conv/Conv with /convs/convs.0/activate/Relu

[05/31/2024-03:09:20] [TRT] [V] After dupe layer removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After final dead-layer removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After tensor merging: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After vertical fusions: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After dupe layer removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After final dead-layer removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After tensor merging: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After slice removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] After concat removal: 7 layers

[05/31/2024-03:09:20] [TRT] [V] Trying to split Reshape and strided tensor

[05/31/2024-03:09:20] [TRT] [V] Graph construction and optimization completed in 0.1482 seconds.

[05/31/2024-03:09:20] [TRT] [I] ---------- Layers Running on DLA ----------

[05/31/2024-03:09:20] [TRT] [I] ---------- Layers Running on GPU ----------

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.0/conv/Conv

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.1/conv/Conv

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] CONVOLUTION: /convs/convs.0/conv/Conv + /convs/convs.0/activate/Relu

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] CONVOLUTION: /conv_seg/Conv

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] RESIZE: /seg_result/resize

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] TOPK: /seg_result/argmax

[05/31/2024-03:09:20] [TRT] [I] [GpuLayer] SHUFFLE: (Unnamed Layer* 7) [Shuffle]

Furthermore, I tried to remove the resize layer,and model is like this:


I found that these convolutional layers can work on DLA in fp16 mode:
[06/05/2024-03:41:38] [TRT] [V] {ForeignNode[/neck/fpn_convs.0/fpn_convs.0.1/conv/Conv…/conv_seg/Conv]} successfully offloaded to DLA.

[06/05/2024-03:41:38] [TRT] [V] Memory consumption details:

[06/05/2024-03:41:38] [TRT] [V] Pool Sizes: Managed SRAM = 0.5 MiB, Local DRAM = 1024 MiB, Global DRAM = 512 MiB

[06/05/2024-03:41:38] [TRT] [V] Required: Managed SRAM = 0.5 MiB, Local DRAM = 128 MiB, Global DRAM = 4 MiB

[06/05/2024-03:41:38] [TRT] [V] DLA Memory Consumption Summary:

[06/05/2024-03:41:38] [TRT] [V] Number of DLA node candidates offloaded : 1 out of 1

[06/05/2024-03:41:38] [TRT] [V] Total memory required by accepted candidates : Managed SRAM = 0.5 MiB, Local DRAM = 128 MiB, Global DRAM = 4 MiB

[06/05/2024-03:41:38] [TRT] [V] After DLA optimization: 2 layers

[06/05/2024-03:41:38] [TRT] [V] Applying ScaleNodes fusions.

[06/05/2024-03:41:38] [TRT] [V] After scale fusion: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After dupe layer removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After final dead-layer removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After tensor merging: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After vertical fusions: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After dupe layer removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After final dead-layer removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After tensor merging: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After slice removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] After concat removal: 2 layers

[06/05/2024-03:41:38] [TRT] [V] Trying to split Reshape and strided tensor

[06/05/2024-03:41:38] [TRT] [V] Graph construction and optimization completed in 0.899575 seconds.

[06/05/2024-03:41:38] [TRT] [I] ---------- Layers Running on DLA ----------

[06/05/2024-03:41:38] [TRT] [I] [DlaLayer] {ForeignNode[/neck/fpn_convs.0/fpn_convs.0.1/conv/Conv…/conv_seg/Conv]}

[06/05/2024-03:41:38] [TRT] [I] ---------- Layers Running on GPU ----------

[06/05/2024-03:41:38] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.0/conv/Conv

However, they cannot work on DLA in int8 mode:
[06/05/2024-03:42:30] [TRT] [W] {ForeignNode[/neck/fpn_convs.0/fpn_convs.0.0/conv/Conv…/conv_seg/Conv]} cannot be compiled by DLA, falling back to GPU.

[06/05/2024-03:42:30] [TRT] [V] DLA Memory Consumption Summary:

[06/05/2024-03:42:30] [TRT] [V] Number of DLA node candidates offloaded : 0 out of 1

[06/05/2024-03:42:30] [TRT] [V] Total memory required by accepted candidates : Managed SRAM = 0 MiB, Local DRAM = 0 MiB, Global DRAM = 0 MiB

[06/05/2024-03:42:30] [TRT] [V] After DLA optimization: 5 layers

[06/05/2024-03:42:30] [TRT] [V] Applying ScaleNodes fusions.

[06/05/2024-03:42:30] [TRT] [V] After scale fusion: 5 layers

[06/05/2024-03:42:30] [TRT] [V] Running: ConvReluFusion on /convs/convs.0/conv/Conv

[06/05/2024-03:42:30] [TRT] [V] ConvReluFusion: Fusing /convs/convs.0/conv/Conv with /convs/convs.0/activate/Relu

[06/05/2024-03:42:30] [TRT] [V] After dupe layer removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After final dead-layer removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After tensor merging: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After vertical fusions: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After dupe layer removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After final dead-layer removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After tensor merging: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After slice removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] After concat removal: 4 layers

[06/05/2024-03:42:30] [TRT] [V] Trying to split Reshape and strided tensor

[06/05/2024-03:42:30] [TRT] [V] Graph construction and optimization completed in 0.129133 seconds.

[06/05/2024-03:42:30] [TRT] [I] ---------- Layers Running on DLA ----------

[06/05/2024-03:42:30] [TRT] [I] ---------- Layers Running on GPU ----------

[06/05/2024-03:42:30] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.0/conv/Conv

[06/05/2024-03:42:30] [TRT] [I] [GpuLayer] CONVOLUTION: /neck/fpn_convs.0/fpn_convs.0.1/conv/Conv

[06/05/2024-03:42:30] [TRT] [I] [GpuLayer] CONVOLUTION: /convs/convs.0/conv/Conv + /convs/convs.0/activate/Relu

[06/05/2024-03:42:30] [TRT] [I] [GpuLayer] CONVOLUTION: /conv_seg/Conv

I want to know why this is happening

Hi,

Are you able to share the complete TensorRT conversion log with us?

Thanks.

log_no_ocrhead_no_resize_int8_debug.log (467.8 KB)
This is the log of int8 mode
log_no_ocrhead_no_resize_fp16_debug.log (64.0 KB)
This is the log of fp16 mode

Hi,

The two logs are truncated without finishing or error.

Are these all you get when converting the TensorRT engine?
Could you double-check it?

Thanks.

我确定这些都是完整的log,在最后显示finish了

Hi,

Sorry for the missing.

Based on your log, we do see a similar issue as you reported.
Could you share the ONNX model with us as well?

Thanks.

here is the onnx model

Hi,

Could you share the ONNX file with us instead?

Thanks.

iter_240000_1224x1024_simplified.surgeon.naive.debug.debug_output.zip (9.8 KB)
This is the orin file

Thanks for the file.
We will give it a check and update.

Hi,

Could you double-check if you share the same ONNX file as the log provided above?

The file contains only one Conv layer and cannot run on DLA for both fp16 and int precision.
This is different from the log you shared.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.