Orin AGX TensorRT DLA export fails but conv specs are below DLA layer restrictions

visoft · January 24, 2024, 7:31am

Hi!

I have a conv2 layer that is rejected from DLA. Its specifications are:
Conv2d(64, 64, kernel_size=(29, 1), stride=(1, 1), padding=(28, 0), dilation=(2, 1), groups=64, bias=False)

I observed the restrictions shown here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Note that kernel size < 32, padding < kernel size < 31, groups < 8192 and no INT8 optimisation is used.

Attached, is a 3 layer net, in onnx format.
minimal.zip (22.9 KB)

Conv1 is the infringing layer.

The trtexec command:
/usr/src/tensorrt/bin/trtexec --onnx=minimal.onnx --saveEngine=hey.engine --useDLACore=0 --allowGPUFallback --separateProfileRun

The relevant output from trtexec command:

[01/24/2024-09:00:25] [W] [TRT] DLA compiler failed with error code: 2 while compiling layer: Conv_1. 
Hint: Try reducing the input and/or weight sizes. 
[01/24/2024-09:00:25] [W] [TRT] Validation failed for DLA layer: Conv_1. Switching to GPU fallback.
[01/24/2024-09:00:25] [W] [TRT] Splitting DLA subgraph at: Conv_1 because DLA validation failed for this layer.
[01/24/2024-09:00:25] [W] [TRT] DLA compiler failed with error code: 2 while compiling layer: Conv_1. 
Hint: Try reducing the input and/or weight sizes. 
[01/24/2024-09:00:25] [W] [TRT] Validation failed for DLA layer: Conv_1. Switching to GPU fallback.
[01/24/2024-09:00:25] [I] [TRT] ---------- Layers Running on DLA ----------
[01/24/2024-09:00:25] [I] [TRT] [DlaLayer] {ForeignNode[Conv_0]}
[01/24/2024-09:00:25] [I] [TRT] [DlaLayer] {ForeignNode[Conv_2]}
[01/24/2024-09:00:25] [I] [TRT] ---------- Layers Running on GPU ----------
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] SHUFFLE: shuffle_between_input_and_Conv_1
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] CONVOLUTION: Conv_1
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] SHUFFLE: shuffle_after_input.4
[01/24/2024-09:00:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +511, now: CPU 1108, GPU 8960 (MiB)
[01/24/2024-09:00:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +73, now: CPU 1191, GPU 9033 (MiB)

Planned workaround is to reduce the kernel size. This layer is part of a bigger network, what is here is a minimal example.

I want to know what restrictions I infringed? The error code 2 and the generic guideline are not helping much.

The system:

Jetson AGX Orin 32GB SoC: tegra23x
Ubuntu 20.04 focal
Release: 5.10.120-tegra
CUDA Arch BIN: 8.7
CUDA: 11.4.315
cuDNN: 8.6.0.166
Python 3.8.10
TensorRT: 5.1.2
L4T: 35.4.1
Jetpack: 5.1.2

Python code to generate the onnx:

    net = nn.Sequential()
    input_tensor = torch.rand(1, 64, 150, 1, dtype=torch.float32).to("cuda")
    net.append(nn.Conv2d(in_channels=64, out_channels=64,kernel_size=(1, 1), padding=(0, 0), bias=False))
    net.append(nn.Conv2d(in_channels=64, out_channels=64,kernel_size=(29, 1), padding=(28,0), dilation=(2, 1), groups=64, bias=False))
    net.append(nn.Conv2d(in_channels=64, out_channels=1, kernel_size=(1, 1), bias=False))
    net.to("cuda")
    out = net(input_tensor)
    assert out is not None
    torch.onnx.export(net, input_tensor, onnx_file_name, verbose=True)

Cheers,
Cristi

AastaLLL · January 24, 2024, 9:25am

Hi,

Would you mind testing it with JetPack 6.0DP?
The DLA library is upgraded to 3.14 with several supports added.

Thanks.

visoft · February 6, 2024, 5:58pm

Unfortunately I did not have the time to test it. Will re-open/re-create the issue if the task list reaches it.

system · February 29, 2024, 5:48am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Orin: All layers pushed to GPU, zero layers on DLA Jetson AGX Orin tensorrt , dla	7	1052	April 26, 2023
Clarification about dynamic layers support on DLA core (Jetson AGX Orin 64 GB) Jetson AGX Orin tensorrt , cuda , dla	7	92	June 24, 2025
Fail at runing conv layer on DLA Jetson AGX Orin dla	13	1215	November 9, 2022
Observing different output when running layer on DLA vs running on GPU Jetson AGX Xavier dla	5	1096	June 25, 2021
Model compilation fails for DLA with AssertionError actualOutputDims == expectedOutputDims failed Jetson Orin NX dla	8	76	April 9, 2025
Cannot build a TensorRT engine for DLA because Constant_output_0 is not supported in DLA Jetson AGX Orin tensorrt , dla	8	240	July 23, 2024
API usage error of torch2trt on Jetson Orin nano Jetson Orin Nano pytorch	10	1545	September 12, 2023
Convert model to TensorRT with DLA \| DLA Node compilation Failed TensorRT	3	929	October 12, 2021
Why some convolutional layers cannot be compiled on DLA under int8? Jetson AGX Orin tensorrt	14	379	July 23, 2024
Getting less throughput while enabling DLAs on Jetson AGX Orin Jetson AGX Orin dla	5	767	February 23, 2023

Orin AGX TensorRT DLA export fails but conv specs are below DLA layer restrictions

Related topics