Wrong result from DLA

wsmlby · January 9, 2021, 1:36am

I have been using Xavier AGX for a while. Recently we are deploying shufflenet on it, and I noticed that DLA is giving me completely wrong result:

The model is basically a customer trained shufflenet v2, from

We converted into onnx and load into TensorRT(C++, nvinfer) to run. I can run it correctly with GPU, but if i run it with DLA, it gives completely wrong result. (images that gives 0.98 score on GPU turns out to be 0.001 on DLA).

I am sure the other part of the system is working correctly as I get same(but wrong) results from DLA if I run the same input multiple times; I also get different result for different input. Therefore, I am pretty sure the input, output is setup correctly.

Can we fully trust the engine building part of tensorRT to only schedule what DLA can successfully run on DLA?

AastaLLL · January 11, 2021, 3:21am

Hi,

Which TensorRT version do you use?
If you are not using v7.1.3, it’s recommended to upgrade to it first.

Could you share the ONNX model, input, and expected output with us to do a more in-depth investigation?
In general, TensorRT builder will fallback the non-supported layer into GPU.
But since the layer configuration can be very different cross models, we will need the model to figure out the real cause.

Thanks.

wsmlby · January 11, 2021, 5:27am

yes it is V7.1.3

Here is the model. Just run it with GPU and DLA gives vastly different result.

shufflenet_fs.onnx (8.7 MB)

wsmlby · January 11, 2021, 7:06am

@AastaLLL, thanks for looking into this:

here is a better example:

with the shufflenet_fs.onnx provided from my last comment with this input: output1.dat (588 KB)

I can run the model with trtexec:
GPU:
/usr/src/tensorrt/bin/trtexec --onnx=/tmp/shufflenet_fs.onnx --loadInputs=data:/tmp/output1.dat --dumpOutput

result:

-0.531341 -0.392079 -2.12658 -2.96252 -2.2261 0.697449 -1.04333 -2.84307 -3.87462 -2.13854 -1.52387 -3.24468 -4.39194 -5.04513 ...

DLA:

/usr/src/tensorrt/bin/trtexec --onnx=/tmp/shufflenet_fs.onnx --loadInputs=data:/tmp/output1.dat --dumpOutput --useDLACore=0 --allowGPUFallback

result:
56.7812 13.0156 -28.7969 20 36.9375 -0.486328 42.0312 3.17773 -43.75 38.4688 -68.5 40.875 -2.60352 ...

I have run it multiple times, GPU and DLA both gives stable results, but different.

Update: just double checked:
with trtexec, the GPU result is stable. the DLA result is not fully stable, but generally not too far from run to run.
The DLA result is still very different from the GPU result, even the trends are different. I also can verify that the GPU result are the expected result.

AastaLLL · January 13, 2021, 8:08am

Hi,

Thanks for your model and input data.
We confirm that the same issue also occurs in our environment.

Our internal team is now working on this.
Will share more information with you for any progress.

Thanks.

AastaLLL · April 26, 2021, 3:53am

Hi,

Sorry to keep you waiting.

We have confirmed that this issue is fixed in our internal DLA package.
It will be available in our next JetPack release.

Thanks.

VivekKrishnan · May 20, 2021, 7:24pm

Hey there, we are encountering this issue in a model we are deploying in the same fashion (pytorch → onnx → tensorrt). Specifically, we notice a different result between running on the GPU in fp16 mode (which produces the correct output) vs running using the DLA backend in fp16 mode (which produces incorrect results)

Are you able to comment on what causes the issue above. Is there a certain set of layers that might cause this issue? Is there an issue with the onnx to tensorrt path of loading models? Is there a way to confirm that our model will work with the next tensorrt release?

When using trtexec on the model, here is an example of the raw outputs I am seeing:
GPU no fp16
0.12959 0.455278 5.12768 0.741761 0.478294 5.16925 0.752687 1.74247 5.00664 0.695554 …

GPU with fp16
0.130981 0.45874 5.11719 0.740234 0.482422 5.16016 0.751953 1.73828 5.00391 0.696289 …

DLA with fp16
0.788574 -0.459229 2.82617 1.15723 -0.508301 3.22852 1.20312 0.27417 3.23047 1.12695 …

kayccc · May 26, 2021, 7:35am

Hi VivekKrishnan,

Please help to open a new topic for your issue. Thanks

Topic		Replies	Views
DLA results not the same as pure GPU results Jetson AGX Xavier dla	4	845	March 14, 2022
Trtexec failed to generate engine (Internal Error) with DLA Jetson Orin NX tensorrt , nvbugs , dla	7	1195	April 8, 2024
TensorRT run DLA on Xavier Jetson AGX Xavier nvbugs	11	1749	October 18, 2021
TRT8 breaks DLA Jetson AGX Xavier tensorrt	9	1246	February 9, 2022
DLA run results in NaN values Jetson AGX Xavier dla	4	628	October 18, 2021
Unable to use DLA with TensorRT Jetson AGX Xavier	11	3474	November 8, 2018
[TensorRT] Running a simple onnx model on Jetson Xavier DLA Jetson Xavier NX tensorrt , onnx	12	3326	August 10, 2022
Cannot build a TensorRT engine for DLA because Constant_output_0 is not supported in DLA Jetson AGX Orin tensorrt , dla	8	358	July 23, 2024
Trtexec log problem and use DLA error on Jetson Xavier Jetson AGX Xavier dla	7	1670	October 18, 2021
Convert model to TensorRT with DLA \| DLA Node compilation Failed TensorRT	3	997	October 12, 2021

Wrong result from DLA

Related topics