Customize DLA unsupported layers

hackerliang1 · August 21, 2024, 3:36am

I wonder is there a way I can customize unsupported layers on DLA?

There are multiple documents showing that I can customize CUDA/Tensor Cores for TensorRT e.g. Extending TensorRT with Custom Layers.

And there are some plugin examples in TensorRT Plugins.

AastaLLL · August 28, 2024, 9:09am

Hi,

Sorry for the late update.

Which layer do you need?
Since DLA is a hardware engine, please check the below document to see if your layer can be supported or not first:

github.com

NVIDIA/Deep-Learning-Accelerator-SW/blob/main/operators/README.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators & Functions on Orin DLA

DLA operator functionality is exposed through the TensorRT builder, which internally links to DLA SW libraries (see [DLA Workflow](https://developer.nvidia.com/deep-learning-accelerator)). While some ONNX operators or functions may already be available in DLA SW, TensorRT may not expose them yet.
See below for the support matrix of ONNX operators & functions on Orin DLA. If you are interested in a specific DLA operator that is not supported through TensorRT yet, feel free to raise a [GitHub Issue](https://github.com/NVIDIA/Deep-Learning-Accelerator-SW/issues) and/or inform your NVIDIA representative (in particular for NVIDIA DRIVE customers).

See [General Restrictions](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla-lay-supp-rest) that apply to all operations below. Many of those ops are supported on Xavier DLA as well, see [Layer Support and Restrictions](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla-lay-supp-rest).

TensorRT 8.6 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md).

Note that the scripts in `op_reconstruction/` are intended as a recipe for how ops currently not supported on DLA can be decomposed into supported ops. Depending on your setup, you may choose to perform such op reconstructions in the ONNX domain post-training (as done here) or during the training process (for example in TensorFlow or PyTorch). The case of "Native" in the DLA SW support column and "Reconstruction" in the TensorRT support column indicates that an op can be supported through TensorRT by decomposing it into other DLA ops already supported by TensorRT.

Below Operator Support Matrix requires the following minimum system config (the OS by default gets shipped with the DLA SW and TensorRT versions to its right):

| **Hardware platform** | **OS**               | **DLA SW version** | **TensorRT version** |
| ----------------- | ---------------- | -------------- | ---------------- |
| DRIVE Orin (Automotive)        | DRIVE OS 6.0.6.0 | DLA 3.12.0     | TensorRT 8.5.10  |
| Jetson Orin (Embedded)      | JetPack 5.1.1    | DLA 3.12.1     | TensorRT 8.5.2   |
| DRIVE Orin (Automotive)        | DRIVE OS 6.0.7.0 | DLA 3.13.0     | TensorRT 8.6.10  |

This file has been truncated. show original

Thanks.

hackerliang1 · September 13, 2024, 1:27am

Hi,

Sorry for the late reply.

I want to deploy an Unet-like neural network to DLA, e.g. from the official Unet repo.

github.com

milesial/Pytorch-UNet/blob/21d7850f2af30a9695bbeea75f3136aa538cfc4a/unet/unet_model.py#L22


      
          
              self.inc = (DoubleConv(n_channels, 64))
              self.down1 = (Down(64, 128))
              self.down2 = (Down(128, 256))
              self.down3 = (Down(256, 512))
              factor = 2 if bilinear else 1
              self.down4 = (Down(512, 1024 // factor))
              self.up1 = (Up(1024, 512 // factor, bilinear))
              self.up2 = (Up(512, 256 // factor, bilinear))
              self.up3 = (Up(256, 128 // factor, bilinear))
              self.up4 = (Up(128, 64, bilinear))
              self.outc = (OutConv(64, n_classes))
          
          def forward(self, x):
              x1 = self.inc(x)
              x2 = self.down1(x1)
              x3 = self.down2(x2)
              x4 = self.down3(x3)
              x5 = self.down4(x4)
              x = self.up1(x5, x4)
              x = self.up2(x, x3)

There are two upscaling options from the network, one is to use a resize layer to perform a bilinear upscaling, and another is to use a deconvolution layer(ConvTranspose2d) to perform the upscaling.

github.com

milesial/Pytorch-UNet/blob/21d7850f2af30a9695bbeea75f3136aa538cfc4a/unet/unet_parts.py#L52


      
          class Up(nn.Module):
              """Upscaling then double conv"""
          
              def __init__(self, in_channels, out_channels, bilinear=True):
                  super().__init__()
          
                  # if bilinear, use the normal convolutions to reduce the number of channels
                  if bilinear:
                      self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
                      self.conv = DoubleConv(in_channels, out_channels, in_channels // 2)
                  else:
                      self.up = nn.ConvTranspose2d(in_channels, in_channels // 2, kernel_size=2, stride=2)
                      self.conv = DoubleConv(in_channels, out_channels)
          
              def forward(self, x1, x2):
                  x1 = self.up(x1)
                  # input is CHW
                  diffY = x2.size()[2] - x1.size()[2]
                  diffX = x2.size()[3] - x1.size()[3]
          
                  x1 = F.pad(x1, [diffX // 2, diffX - diffX // 2,

I have tested several times that the performance between these two layers is quite difference. DLA does not support the resize layer in Unet but deconvolution.

I wonder is there a way I can customize unsupported layers on DLA? Or DLA is the fixed-function hardware that accelerates specific deep-learning layers, I can not customize unsupported layers.

Thank you.

AastaLLL · September 16, 2024, 6:59am

Hi,

You can find the details below. For a resize layer, DLA only supports integer scaling.

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla-lay-supp-rest

The last two elements in scales, representing the scale values along height and width dimensions, respectively, must be integer values in the range of [1, 32] in nearest-neighbor mode and [1, 4] in bilinear mode.

Does your model meet the requirements?
Thanks.

hackerliang1 · September 19, 2024, 1:03am

The official Unet model does not meet the requirements. I will use deconvolution instead of bilinear interpolation.

Thanks.

system · October 8, 2024, 5:00am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
(SHUFFLE): Unsupported on DLA switching to GPU Jetson AGX Orin tensorrt , dla	6	585	April 8, 2024
Layer '/Round' did not supported in DLA Jetson AGX Orin dla	4	123	July 30, 2024
How can I customize matrix multiplication on DLA Jetson AGX Orin dla	12	221	September 25, 2024
Jetson Orin: All layers pushed to GPU, zero layers on DLA Jetson AGX Orin tensorrt , dla	7	1049	April 26, 2023
DLA trtexec questions Jetson AGX Xavier	4	1806	October 18, 2021
Xavier NX does not support adaptative average pooling on DLA? Jetson Xavier NX tensorrt	27	1151	October 11, 2023
Unsupported shuffle layers to run on DLA Jetson AGX Orin tensorrt , dla	6	441	May 13, 2024
Why some convolutional layers cannot be compiled on DLA under int8? Jetson AGX Orin tensorrt	14	379	July 23, 2024
Avoiding the creation of unnamed layers Jetson AGX Xavier tensorrt , dla , onnx	10	74	December 19, 2024
FP16 builder does not work, DLA does not accept anything, How to accelerate Deep Learning? Jetson AGX Xavier tensorrt	7	1222	February 9, 2022

Customize DLA unsupported layers

Related topics