Wrong result from DLA

I have been using Xavier AGX for a while. Recently we are deploying shufflenet on it, and I noticed that DLA is giving me completely wrong result:

The model is basically a customer trained shufflenet v2, from

We converted into onnx and load into TensorRT(C++, nvinfer) to run. I can run it correctly with GPU, but if i run it with DLA, it gives completely wrong result. (images that gives 0.98 score on GPU turns out to be 0.001 on DLA).

I am sure the other part of the system is working correctly as I get same(but wrong) results from DLA if I run the same input multiple times; I also get different result for different input. Therefore, I am pretty sure the input, output is setup correctly.

Can we fully trust the engine building part of tensorRT to only schedule what DLA can successfully run on DLA?


Which TensorRT version do you use?
If you are not using v7.1.3, it’s recommended to upgrade to it first.

Could you share the ONNX model, input, and expected output with us to do a more in-depth investigation?
In general, TensorRT builder will fallback the non-supported layer into GPU.
But since the layer configuration can be very different cross models, we will need the model to figure out the real cause.


yes it is V7.1.3

Here is the model. Just run it with GPU and DLA gives vastly different result.

shufflenet_fs.onnx (8.7 MB)

@AastaLLL, thanks for looking into this:

here is a better example:

with the shufflenet_fs.onnx provided from my last comment with this input: output1.dat (588 KB)

I can run the model with trtexec:
/usr/src/tensorrt/bin/trtexec --onnx=/tmp/shufflenet_fs.onnx --loadInputs=data:/tmp/output1.dat --dumpOutput


-0.531341 -0.392079 -2.12658 -2.96252 -2.2261 0.697449 -1.04333 -2.84307 -3.87462 -2.13854 -1.52387 -3.24468 -4.39194 -5.04513 ...


/usr/src/tensorrt/bin/trtexec --onnx=/tmp/shufflenet_fs.onnx --loadInputs=data:/tmp/output1.dat --dumpOutput --useDLACore=0 --allowGPUFallback

56.7812 13.0156 -28.7969 20 36.9375 -0.486328 42.0312 3.17773 -43.75 38.4688 -68.5 40.875 -2.60352 ...

I have run it multiple times, GPU and DLA both gives stable results, but different.

Update: just double checked:
with trtexec, the GPU result is stable. the DLA result is not fully stable, but generally not too far from run to run.
The DLA result is still very different from the GPU result, even the trends are different. I also can verify that the GPU result are the expected result.


Thanks for your model and input data.
We confirm that the same issue also occurs in our environment.

Our internal team is now working on this.
Will share more information with you for any progress.



Sorry to keep you waiting.

We have confirmed that this issue is fixed in our internal DLA package.
It will be available in our next JetPack release.


Hey there, we are encountering this issue in a model we are deploying in the same fashion (pytorch → onnx → tensorrt). Specifically, we notice a different result between running on the GPU in fp16 mode (which produces the correct output) vs running using the DLA backend in fp16 mode (which produces incorrect results)

Are you able to comment on what causes the issue above. Is there a certain set of layers that might cause this issue? Is there an issue with the onnx to tensorrt path of loading models? Is there a way to confirm that our model will work with the next tensorrt release?

When using trtexec on the model, here is an example of the raw outputs I am seeing:
GPU no fp16
0.12959 0.455278 5.12768 0.741761 0.478294 5.16925 0.752687 1.74247 5.00664 0.695554 …

GPU with fp16
0.130981 0.45874 5.11719 0.740234 0.482422 5.16016 0.751953 1.73828 5.00391 0.696289 …

DLA with fp16
0.788574 -0.459229 2.82617 1.15723 -0.508301 3.22852 1.20312 0.27417 3.23047 1.12695 …

Hi VivekKrishnan,

Please help to open a new topic for your issue. Thanks