Output tensor of Add operations are fellback to GPU

chinnl.84 · January 8, 2025, 7:42am

Hi there,
I’m trying to modify the yolov7 model to inference on DLA core of Orin/Xavier NX devices.
But my Add operations’ outputs are fellback to GPU.
As it referred in the document:

DLA does support the element-wise ops with same shape inputs. My 2 inputs are at the same shape of [1, 2, H, W] then why the outputs are fellback to GPU?
The terminal outputs are provided below.
Thanks.

[01/08/2025-14:25:49] [I] [TRT] ---------- Layers Running on DLA ----------
[01/08/2025-14:25:49] [I] [TRT] [DlaLayer] {ForeignNode[/model.0/conv/Conv…/model.77/Concat_5]}
[01/08/2025-14:25:49] [I] [TRT] ---------- Layers Running on GPU ----------
[01/08/2025-14:25:49] [I] [TRT] [GpuLayer] CONSTANT: /model.77/Constant_17_output_0
[01/08/2025-14:25:49] [I] [TRT] [GpuLayer] CONSTANT: /model.77/Constant_69_output_0
[01/08/2025-14:25:49] [I] [TRT] [GpuLayer] CONSTANT: /model.77/Constant_121_output_0

Edit: I’m using tensorrt 8.5.2, Jetpack 5.1.1

AastaLLL · January 8, 2025, 9:50am

Hi,

Please find below the TensorRT 8.5 document.

DLA requires FP16 or INT8 data format.
How do you run it with TensorRT? Could you share the complete verbose log with us?

Thanks.

chinnl.84 · January 8, 2025, 10:07am

log.txt (379.7 KB)
I run this command and log the result to the attached file:
trtexec --int8 --inputIOFormats=int8:dla_hwc4 --outputIOFormats=int8:chw32 --onnx=best_mod.onnx --useDLACore=0 --allowGPUFallback --verbose
More infomation about the fellback layers which is returned in the onnx conversion:
*%/model.77/Constant_18_output_0 : Float(1, 2, 80, 80, strides=[12800, 6400, 80, 1], requires_grad=0, device=cpu) = onnx::Constantvalue=, onnx_name=“/model.77/Constant_18”, scope: models.yolo.Model::/models.yolo.IDetect::model.77 *

%/model.77/Constant_73_output_0 : Float(1, 2, 40, 40, strides=[3200, 1600, 40, 1], requires_grad=0, device=cpu) = onnx::Constantvalue=, onnx_name=“/model.77/Constant_73”, scope: models.yolo.Model::/models.yolo.IDetect::model.77 *
%/model.77/Constant_128_output_0 : Float(1, 2, 20, 20, strides=[800, 400, 20, 1], requires_grad=0, device=cpu) = onnx::Constantvalue=, onnx_name=“/model.77/Constant_128”, scope: models.yolo.Model::/models.yolo.IDetect::model.77*
These layers are inside of this block:

class XYCalc(nn.Module):
def init(self, w1, w2, nx, ny):
super(XYCalc, self).init()
self.m1 = nn.Conv2d(2, 2, kernel_size = 1, padding = 0, bias = True)
self.m2 = nn.Conv2d(2, 2, kernel_size = 1, padding = 0, bias = False)

    self.m1.weight.requires_grad = False
    self.m1.bias.requires_grad = False
    self.m2.weight.requires_grad = False

    nn.init.zeros_(self.m1.weight)
    nn.init.constant_(self.m1.bias, -0.5)
    nn.init.zeros_(self.m2.weight)

    self.m1.weight[0, 0, :, :] = w1
    self.m1.weight[1, 1, :, :] = w1

    self.m2.weight[0, 0, :, :] = w2
    self.m2.weight[1, 1, :, :] = w2

    yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
    self.offset = torch.stack((xv, yv), 2).view((1, 2, ny, nx)).float()
    
def forward(self, x):
    x = self.m1(x)
    x = x + self.offset

    return self.m2(x)

AastaLLL · January 9, 2025, 6:59am

Hi,

It looks like the layer is running on the DLA:

[01/08/2025-17:03:21] [V] [TRT] {ForeignNode[/model.0/conv/Conv.../model.77/Concat_5]}: DLA Network Layer Information:
...
Layer(ELEMENTWISE): /model.77/Add_4, Precision: Int8, /model.77/Conv_16_output_0'[Int8([1,2,40,40])], (Unnamed Layer* 173) [Constant]_output'[Int8([1,2,40,40])] -> /model.77/Add_4_output_0'[Int8([1,2,40,40])]

Based on the log, it seems the fallback layer is the “CONSTANT”.

Thanks.

chinnl.84 · January 9, 2025, 7:16am

Yes, one of my inputs is constant. When I search for the fallback layers on netron, it points out the constant input of the add operations. Then does the DLA not support addition with constant input?

chinnl.84 · January 9, 2025, 9:36am

I have just realized that DLA does not support constant layers. The add operations are supported only if the 2 inputs are not constant.
Btw, thanks for your reply!

system · January 28, 2025, 4:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
(SHUFFLE): Unsupported on DLA switching to GPU Jetson AGX Orin tensorrt , dla	6	573	April 8, 2024
Jetson Orin: All layers pushed to GPU, zero layers on DLA Jetson AGX Orin tensorrt , dla	7	1040	April 26, 2023
Xavier NX does not support adaptative average pooling on DLA? Jetson Xavier NX tensorrt	27	1140	October 11, 2023
How can I customize matrix multiplication on DLA Jetson AGX Orin dla	12	205	September 25, 2024
Customize DLA unsupported layers Jetson AGX Orin dla	5	69	October 8, 2024
Conver tf1 model to onnx, inference in tensorrt error Jetson Xavier NX tensorrt , tensorflow , jetson-inference , python	4	1222	October 10, 2021
Failed to convert YOLOv7.onnx model to DLA engine Jetson AGX Xavier tensorrt , nvbugs , dla	6	1742	May 31, 2023
Jetson AGX Orin，how to use DLA for yolov2_tiny Jetson AGX Orin dla	16	1909	April 27, 2023
DLA trtexec questions Jetson AGX Xavier	4	1804	October 18, 2021
TensorRT model inference fully on DLA is slow due to abnormally slow cudaEventSynchronize time Jetson AGX Orin tensorrt , cuda , dla	10	1546	January 17, 2024

Output tensor of Add operations are fellback to GPU

Related topics