Hi!
I have a conv2 layer that is rejected from DLA. Its specifications are:
Conv2d(64, 64, kernel_size=(29, 1), stride=(1, 1), padding=(28, 0), dilation=(2, 1), groups=64, bias=False)
I observed the restrictions shown here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
Note that kernel size < 32, padding < kernel size < 31, groups < 8192 and no INT8 optimisation is used.
Attached, is a 3 layer net, in onnx format.
minimal.zip (22.9 KB)
Conv1 is the infringing layer.
The trtexec command:
/usr/src/tensorrt/bin/trtexec --onnx=minimal.onnx --saveEngine=hey.engine --useDLACore=0 --allowGPUFallback --separateProfileRun
The relevant output from trtexec command:
[01/24/2024-09:00:25] [W] [TRT] DLA compiler failed with error code: 2 while compiling layer: Conv_1.
Hint: Try reducing the input and/or weight sizes.
[01/24/2024-09:00:25] [W] [TRT] Validation failed for DLA layer: Conv_1. Switching to GPU fallback.
[01/24/2024-09:00:25] [W] [TRT] Splitting DLA subgraph at: Conv_1 because DLA validation failed for this layer.
[01/24/2024-09:00:25] [W] [TRT] DLA compiler failed with error code: 2 while compiling layer: Conv_1.
Hint: Try reducing the input and/or weight sizes.
[01/24/2024-09:00:25] [W] [TRT] Validation failed for DLA layer: Conv_1. Switching to GPU fallback.
[01/24/2024-09:00:25] [I] [TRT] ---------- Layers Running on DLA ----------
[01/24/2024-09:00:25] [I] [TRT] [DlaLayer] {ForeignNode[Conv_0]}
[01/24/2024-09:00:25] [I] [TRT] [DlaLayer] {ForeignNode[Conv_2]}
[01/24/2024-09:00:25] [I] [TRT] ---------- Layers Running on GPU ----------
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] SHUFFLE: shuffle_between_input_and_Conv_1
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] CONVOLUTION: Conv_1
[01/24/2024-09:00:25] [I] [TRT] [GpuLayer] SHUFFLE: shuffle_after_input.4
[01/24/2024-09:00:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +511, now: CPU 1108, GPU 8960 (MiB)
[01/24/2024-09:00:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +73, now: CPU 1191, GPU 9033 (MiB)
Planned workaround is to reduce the kernel size. This layer is part of a bigger network, what is here is a minimal example.
I want to know what restrictions I infringed? The error code 2 and the generic guideline are not helping much.
The system:
Jetson AGX Orin 32GB SoC: tegra23x
Ubuntu 20.04 focal
Release: 5.10.120-tegra
CUDA Arch BIN: 8.7
CUDA: 11.4.315
cuDNN: 8.6.0.166
Python 3.8.10
TensorRT: 5.1.2
L4T: 35.4.1
Jetpack: 5.1.2
Python code to generate the onnx:
net = nn.Sequential()
input_tensor = torch.rand(1, 64, 150, 1, dtype=torch.float32).to("cuda")
net.append(nn.Conv2d(in_channels=64, out_channels=64,kernel_size=(1, 1), padding=(0, 0), bias=False))
net.append(nn.Conv2d(in_channels=64, out_channels=64,kernel_size=(29, 1), padding=(28,0), dilation=(2, 1), groups=64, bias=False))
net.append(nn.Conv2d(in_channels=64, out_channels=1, kernel_size=(1, 1), bias=False))
net.to("cuda")
out = net(input_tensor)
assert out is not None
torch.onnx.export(net, input_tensor, onnx_file_name, verbose=True)
Cheers,
Cristi