Orin AGX TensorRT DLA export fails but conv specs are below DLA layer restrictions

Hi!

I have a conv2 layer that is rejected from DLA. Its specifications are:
Conv2d(64, 64, kernel_size=(29, 1), stride=(1, 1), padding=(28, 0), dilation=(2, 1), groups=64, bias=False)

I observed the restrictions shown here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Note that kernel size < 32, padding < kernel size < 31, groups < 8192 and no INT8 optimisation is used.

Attached, is a 3 layer net, in onnx format.
minimal.zip (22.9 KB)

Conv1 is the infringing layer.

The trtexec command:
/usr/src/tensorrt/bin/trtexec --onnx=minimal.onnx --saveEngine=hey.engine --useDLACore=0 --allowGPUFallback --separateProfileRun

The relevant output from trtexec command:

[01/24/2024-09:00:25] [W] [TRT] DLA compiler failed with error code: 2 while compiling layer: Conv_1. 
Hint: Try reducing the input and/or weight sizes. 
[01/24/2024-09:00:25] [W] [TRT] Validation failed for DLA layer: Conv_1. Switching to GPU fallback.
[01/24/2024-09:00:25] [W] [TRT] Splitting DLA subgraph at: Conv_1 because DLA validation failed for this layer.
[01/24/2024-09:00:25] [W] [TRT] DLA compiler failed with error code: 2 while compiling layer: Conv_1. 
Hint: Try reducing the input and/or weight sizes. 
[01/24/2024-09:00:25] [W] [TRT] Validation failed for DLA layer: Conv_1. Switching to GPU fallback.
[01/24/2024-09:00:25] [I] [TRT] ---------- Layers Running on DLA ----------
[01/24/2024-09:00:25] [I] [TRT] [DlaLayer] {ForeignNode[Conv_0]}
[01/24/2024-09:00:25] [I] [TRT] [DlaLayer] {ForeignNode[Conv_2]}
[01/24/2024-09:00:25] [I] [TRT] ---------- Layers Running on GPU ----------
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] SHUFFLE: shuffle_between_input_and_Conv_1
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] CONVOLUTION: Conv_1
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] SHUFFLE: shuffle_after_input.4
[01/24/2024-09:00:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +511, now: CPU 1108, GPU 8960 (MiB)
[01/24/2024-09:00:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +73, now: CPU 1191, GPU 9033 (MiB)

Planned workaround is to reduce the kernel size. This layer is part of a bigger network, what is here is a minimal example.

I want to know what restrictions I infringed? The error code 2 and the generic guideline are not helping much.

The system:

Jetson AGX Orin 32GB SoC: tegra23x
Ubuntu 20.04 focal
Release: 5.10.120-tegra
CUDA Arch BIN: 8.7
CUDA: 11.4.315
cuDNN: 8.6.0.166
Python 3.8.10
TensorRT: 5.1.2
L4T: 35.4.1
Jetpack: 5.1.2

Python code to generate the onnx:

    net = nn.Sequential()
    input_tensor = torch.rand(1, 64, 150, 1, dtype=torch.float32).to("cuda")
    net.append(nn.Conv2d(in_channels=64, out_channels=64,kernel_size=(1, 1), padding=(0, 0), bias=False))
    net.append(nn.Conv2d(in_channels=64, out_channels=64,kernel_size=(29, 1), padding=(28,0), dilation=(2, 1), groups=64, bias=False))
    net.append(nn.Conv2d(in_channels=64, out_channels=1, kernel_size=(1, 1), bias=False))
    net.to("cuda")
    out = net(input_tensor)
    assert out is not None
    torch.onnx.export(net, input_tensor, onnx_file_name, verbose=True)

Cheers,
Cristi

Hi,

Would you mind testing it with JetPack 6.0DP?
The DLA library is upgraded to 3.14 with several supports added.

Thanks.

1 Like

Unfortunately I did not have the time to test it. Will re-open/re-create the issue if the task list reaches it.