Hi there,
I’m trying to modify the yolov7 model to inference on DLA core of Orin/Xavier NX devices.
But my Add operations’ outputs are fellback to GPU.
As it referred in the document:
DLA does support the element-wise ops with same shape inputs. My 2 inputs are at the same shape of [1, 2, H, W] then why the outputs are fellback to GPU?
The terminal outputs are provided below.
Thanks.
[01/08/2025-14:25:49] [I] [TRT] ---------- Layers Running on DLA ----------
[01/08/2025-14:25:49] [I] [TRT] [DlaLayer] {ForeignNode[/model.0/conv/Conv…/model.77/Concat_5]}
[01/08/2025-14:25:49] [I] [TRT] ---------- Layers Running on GPU ----------
[01/08/2025-14:25:49] [I] [TRT] [GpuLayer] CONSTANT: /model.77/Constant_17_output_0
[01/08/2025-14:25:49] [I] [TRT] [GpuLayer] CONSTANT: /model.77/Constant_69_output_0
[01/08/2025-14:25:49] [I] [TRT] [GpuLayer] CONSTANT: /model.77/Constant_121_output_0
Edit: I’m using tensorrt 8.5.2, Jetpack 5.1.1
Hi,
Please find below the TensorRT 8.5 document.
DLA requires FP16 or INT8 data format.
How do you run it with TensorRT? Could you share the complete verbose log with us?
Thanks.
log.txt (379.7 KB)
I run this command and log the result to the attached file:
trtexec --int8 --inputIOFormats=int8:dla_hwc4 --outputIOFormats=int8:chw32 --onnx=best_mod.onnx --useDLACore=0 --allowGPUFallback --verbose
More infomation about the fellback layers which is returned in the onnx conversion:
*%/model.77/Constant_18_output_0 : Float(1, 2, 80, 80, strides=[12800, 6400, 80, 1], requires_grad=0, device=cpu) = onnx::Constantvalue=, onnx_name=“/model.77/Constant_18”, scope: models.yolo.Model::/models.yolo.IDetect::model.77 *
- %/model.77/Constant_73_output_0 : Float(1, 2, 40, 40, strides=[3200, 1600, 40, 1], requires_grad=0, device=cpu) = onnx::Constantvalue=, onnx_name=“/model.77/Constant_73”, scope: models.yolo.Model::/models.yolo.IDetect::model.77 *
- %/model.77/Constant_128_output_0 : Float(1, 2, 20, 20, strides=[800, 400, 20, 1], requires_grad=0, device=cpu) = onnx::Constantvalue=, onnx_name=“/model.77/Constant_128”, scope: models.yolo.Model::/models.yolo.IDetect::model.77*
These layers are inside of this block:
class XYCalc(nn.Module):
def init(self, w1, w2, nx, ny):
super(XYCalc, self).init()
self.m1 = nn.Conv2d(2, 2, kernel_size = 1, padding = 0, bias = True)
self.m2 = nn.Conv2d(2, 2, kernel_size = 1, padding = 0, bias = False)
self.m1.weight.requires_grad = False
self.m1.bias.requires_grad = False
self.m2.weight.requires_grad = False
nn.init.zeros_(self.m1.weight)
nn.init.constant_(self.m1.bias, -0.5)
nn.init.zeros_(self.m2.weight)
self.m1.weight[0, 0, :, :] = w1
self.m1.weight[1, 1, :, :] = w1
self.m2.weight[0, 0, :, :] = w2
self.m2.weight[1, 1, :, :] = w2
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
self.offset = torch.stack((xv, yv), 2).view((1, 2, ny, nx)).float()
def forward(self, x):
x = self.m1(x)
x = x + self.offset
return self.m2(x)
Hi,
It looks like the layer is running on the DLA:
[01/08/2025-17:03:21] [V] [TRT] {ForeignNode[/model.0/conv/Conv.../model.77/Concat_5]}: DLA Network Layer Information:
...
Layer(ELEMENTWISE): /model.77/Add_4, Precision: Int8, /model.77/Conv_16_output_0'[Int8([1,2,40,40])], (Unnamed Layer* 173) [Constant]_output'[Int8([1,2,40,40])] -> /model.77/Add_4_output_0'[Int8([1,2,40,40])]
Based on the log, it seems the fallback layer is the “CONSTANT”.
Thanks.
Yes, one of my inputs is constant. When I search for the fallback layers on netron, it points out the constant input of the add operations. Then does the DLA not support addition with constant input?
I have just realized that DLA does not support constant layers. The add operations are supported only if the 2 inputs are not constant.
Btw, thanks for your reply!