DLA results not the same as pure GPU results

Platform : Jetson Xavier NX
Jetpack Version: 4.4.1
TensorRT Version: 7.1.3
ONNX Version: 1.10.2

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.8

I have compared the outputs between my ONNX model and pure GPU trt model, they are almost the same. but when I come to my pure GPU trt model and trt with DLA model the outputs seem to have a large gap between them. And also I tried the methods mentioned in the previous forum. I already set the input shape of each layer to be 2^n, but the results are still bad. Jetpack 4.6 works fine, but for some reason, we have to use 4.4 for now, is there any ways to resolve this?

# onnx conversion
python -m tf2onnx.convert --opset 11 --input test_frozen_mobilenet.pb --inputs input_1:0 --outputs Identity:0 --output test_frozen_mobilenet.onnx

# trt conversion
./trtexec --onnx=test_64.onnx --fp16 --useDLACore=1 --saveEngine=test_64.trt --verbose --allowGPUFallback

and I am using python to run the inference and compare the outputs
compare_output.py (4.7 KB)

python3 compare_output.py --onnx test_64.onnx --trt test_64.trt   --size 64
sample output
======  batch 0  ======
dla : [[[[0.54248047]]].  [[[0.54589844]]]]
onnx : [[0.48507023].  [0.49452645]]
======  batch 1  ======
dla : [[[[0.54589844]]].  [[[0.5371094 ]]]]
onnx : [[0.49452645].  [0.48755872]]

test_64.onnx (455.2 KB)

test_64.trt (1.3 MB)


Does the DLA from JetPack4.6 generate identical results compared to ONNX?
If yes, this might be a known bug and is fixed in the future release.


Yes, the results in jetpack 4.6 are fine, but for some reason currently, we wish to make it work either in jetpack 4.4. Is there any way to do so?


Since DLA is not open-sourced, we don’t have the fix that can run on JetPack4.4.
Sorry for the inconvenience.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.