Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.4 SDK
other
Target Operating System
Linux
QNX
other
Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other
SDK Manager Version
1.8.3.10426
other
Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other
I am trying to figure out which kind of conv could run at DLA.
I referred to 12.2. DLA Supported Layers and Restrictions and created two simple models only contains conv op as following.
[09/28/2022-12:16:50] [V] [TRT] Adding network input: onnx::Conv_0 with dtype: float32, dimensions: (1, 1706, 3, 3)
[09/28/2022-12:16:50] [V] [TRT] Registering tensor: onnx::Conv_0 for ONNX tensor: onnx::Conv_0
[09/28/2022-12:16:50] [V] [TRT] Parsing node: Constant_0 [Constant]
[09/28/2022-12:16:50] [V] [TRT] Constant_0 [Constant] inputs:
[09/28/2022-12:16:50] [V] [TRT] Constant_0 [Constant] outputs: [onnx::Conv_1 -> (3, 1706, 3, 3)[FLOAT]],
[09/28/2022-12:16:50] [V] [TRT] Parsing node: Constant_1 [Constant]
[09/28/2022-12:16:50] [V] [TRT] Constant_1 [Constant] inputs:
[09/28/2022-12:16:50] [V] [TRT] Constant_1 [Constant] outputs: [onnx::Conv_2 -> (3)[FLOAT]],
[09/28/2022-12:16:50] [V] [TRT] Parsing node: Conv_2 [Conv]
[09/28/2022-12:16:50] [V] [TRT] Searching for input: onnx::Conv_0
[09/28/2022-12:16:50] [V] [TRT] Searching for input: onnx::Conv_1
[09/28/2022-12:16:50] [V] [TRT] Searching for input: onnx::Conv_2
[09/28/2022-12:16:50] [V] [TRT] Conv_2 [Conv] inputs: [onnx::Conv_0 -> (1, 1706, 3, 3)[FLOAT]], [onnx::Conv_1 -> (3, 1706, 3, 3)[FLOAT]], [onnx::Conv_2 -> (3)[FLOAT]],
[09/28/2022-12:16:50] [V] [TRT] Convolution input dimensions: (1, 1706, 3, 3)
[09/28/2022-12:16:50] [V] [TRT] Registering layer: Conv_2 for ONNX node: Conv_2
[09/28/2022-12:16:50] [V] [TRT] Using kernel: (3, 3), strides: (1, 1), prepadding: (0, 0), postpadding: (0, 0), dilations: (1, 1), numOutputs: 3
[09/28/2022-12:16:50] [V] [TRT] Convolution output dimensions: (1, 3, 1, 1)
[09/28/2022-12:16:50] [V] [TRT] Registering tensor: 3_0 for ONNX tensor: 3
[09/28/2022-12:16:50] [V] [TRT] Conv_2 [Conv] outputs: [3 -> (1, 3, 1, 1)[FLOAT]],
[09/28/2022-12:16:50] [V] [TRT] Marking 3_0 as output: 3
[09/28/2022-12:16:50] [I] Finish parsing network model
[09/28/2022-12:16:50] [V] [TRT] Applying generic optimizations to the graph for inference.
[09/28/2022-12:16:50] [V] [TRT] Original: 1 layers
[09/28/2022-12:16:50] [V] [TRT] After dead-layer removal: 1 layers
[09/28/2022-12:16:50] [V] [TRT] After Myelin optimization: 1 layers
[09/28/2022-12:16:50] [V] [TRT] {ForeignNode[Conv_2]} successfully offloaded to DLA.
[09/28/2022-12:16:50] [V] [TRT] Memory consumption details:
[09/28/2022-12:16:50] [V] [TRT] Pool Sizes: Managed SRAM = 0.5 MiB, Local DRAM = 1024 MiB, Global DRAM = 512 MiB
[09/28/2022-12:16:50] [V] [TRT] Required: Managed SRAM = 0.5 MiB, Local DRAM = 2 MiB, Global DRAM = 4 MiB
[09/28/2022-12:16:50] [V] [TRT] DLA Memory Consumption Summary:
[09/28/2022-12:16:50] [V] [TRT] Number of DLA node candidates offloaded : 1 out of 1
[09/28/2022-12:16:50] [V] [TRT] Total memory required by accepted candidates : Managed SRAM = 0.5 MiB, Local DRAM = 2 MiB, Global DRAM = 4 MiB
[09/28/2022-12:16:50] [V] [TRT] After DLA optimization: 3 layers
[09/28/2022-12:16:50] [V] [TRT] Applying ScaleNodes fusions.
[09/28/2022-12:16:50] [V] [TRT] After scale fusion: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After dupe layer removal: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After final dead-layer removal: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After tensor merging: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After vertical fusions: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After dupe layer removal: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After final dead-layer removal: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After tensor merging: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After slice removal: 3 layers
[09/28/2022-12:16:50] [V] [TRT] After concat removal: 3 layers
[09/28/2022-12:16:50] [V] [TRT] Trying to split Reshape and strided tensor
[09/28/2022-12:16:50] [V] [TRT] Graph construction and optimization completed in 0.0152455 seconds.
[09/28/2022-12:16:50] [I] [TRT] ---------- Layers Running on DLA ----------
[09/28/2022-12:16:50] [I] [TRT] [DlaLayer] {ForeignNode[Conv_2]}
[09/28/2022-12:16:50] [I] [TRT] ---------- Layers Running on GPU ----------
[09/28/2022-12:49:31] [V] [TRT] Adding network input: onnx::Conv_0 with dtype: float32, dimensions: (1, 1707, 3, 3)
[09/28/2022-12:49:31] [V] [TRT] Registering tensor: onnx::Conv_0 for ONNX tensor: onnx::Conv_0
[09/28/2022-12:49:31] [V] [TRT] Parsing node: Constant_0 [Constant]
[09/28/2022-12:49:31] [V] [TRT] Constant_0 [Constant] inputs:
[09/28/2022-12:49:31] [V] [TRT] Constant_0 [Constant] outputs: [onnx::Conv_1 -> (3, 1707, 3, 3)[FLOAT]],
[09/28/2022-12:49:31] [V] [TRT] Parsing node: Constant_1 [Constant]
[09/28/2022-12:49:31] [V] [TRT] Constant_1 [Constant] inputs:
[09/28/2022-12:49:31] [V] [TRT] Constant_1 [Constant] outputs: [onnx::Conv_2 -> (3)[FLOAT]],
[09/28/2022-12:49:31] [V] [TRT] Parsing node: Conv_2 [Conv]
[09/28/2022-12:49:31] [V] [TRT] Searching for input: onnx::Conv_0
[09/28/2022-12:49:31] [V] [TRT] Searching for input: onnx::Conv_1
[09/28/2022-12:49:31] [V] [TRT] Searching for input: onnx::Conv_2
[09/28/2022-12:49:31] [V] [TRT] Conv_2 [Conv] inputs: [onnx::Conv_0 -> (1, 1707, 3, 3)[FLOAT]], [onnx::Conv_1 -> (3, 1707, 3, 3)[FLOAT]], [onnx::Conv_2 -> (3)[FLOAT]],
[09/28/2022-12:49:31] [V] [TRT] Convolution input dimensions: (1, 1707, 3, 3)
[09/28/2022-12:49:31] [V] [TRT] Registering layer: Conv_2 for ONNX node: Conv_2
[09/28/2022-12:49:31] [V] [TRT] Using kernel: (3, 3), strides: (1, 1), prepadding: (0, 0), postpadding: (0, 0), dilations: (1, 1), numOutputs: 3
[09/28/2022-12:49:31] [V] [TRT] Convolution output dimensions: (1, 3, 1, 1)
[09/28/2022-12:49:31] [V] [TRT] Registering tensor: 3_0 for ONNX tensor: 3
[09/28/2022-12:49:31] [V] [TRT] Conv_2 [Conv] outputs: [3 -> (1, 3, 1, 1)[FLOAT]],
[09/28/2022-12:49:31] [V] [TRT] Marking 3_0 as output: 3
[09/28/2022-12:49:31] [I] Finish parsing network model
[09/28/2022-12:49:31] [V] [TRT] Applying generic optimizations to the graph for inference.
[09/28/2022-12:49:31] [V] [TRT] Original: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After dead-layer removal: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After Myelin optimization: 1 layers
[09/28/2022-12:49:31] [W] [TRT] Validation failed for DLA layer: Conv_2. Switching to GPU fallback.
[09/28/2022-12:49:31] [V] [TRT] DLA Memory Consumption Summary:
[09/28/2022-12:49:31] [V] [TRT] Number of DLA node candidates offloaded : 0 out of 0
[09/28/2022-12:49:31] [V] [TRT] Total memory required by accepted candidates : Managed SRAM = 0 MiB, Local DRAM = 0 MiB, Global DRAM = 0 MiB
[09/28/2022-12:49:31] [V] [TRT] After DLA optimization: 1 layers
[09/28/2022-12:49:31] [V] [TRT] Applying ScaleNodes fusions.
[09/28/2022-12:49:31] [V] [TRT] After scale fusion: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After dupe layer removal: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After final dead-layer removal: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After tensor merging: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After vertical fusions: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After dupe layer removal: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After final dead-layer removal: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After tensor merging: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After slice removal: 1 layers
[09/28/2022-12:49:31] [V] [TRT] After concat removal: 1 layers
[09/28/2022-12:49:31] [V] [TRT] Trying to split Reshape and strided tensor
[09/28/2022-12:49:31] [V] [TRT] Graph construction and optimization completed in 0.000808394 seconds.
[09/28/2022-12:49:31] [I] [TRT] ---------- Layers Running on DLA ----------
[09/28/2022-12:49:31] [I] [TRT] ---------- Layers Running on GPU ----------
[09/28/2022-12:49:31] [I] [TRT] [GpuLayer] CONVOLUTION: Conv_2
trtexec command:
trtexec --onnx=sample.onnx --fp16 --useDLACore=0 --allowGPUFallback --exportProfile=sample.dla.profile.json --exportLayerInfo=sample.dla.layerinfo.json --exportOutput=sample.dla.output.json --dumpLayerInfo --dumpProfile --profilingVerbosity=detailed --separateProfileRun --useSpinWait --useCudaGraph --saveEngine=sample.dla.engine --verbose
My simple question:
Why conv with input shape [1*1706*3*3] could run at DLA but [1*1707*3*3] could not?
And which restriction I have met?