FP16 builder does not work, DLA does not accept anything, How to accelerate Deep Learning?


I am trying to infer an Onnx model in TensorRT and possibly accelerate it with usage of DLA and FP16 parameters.

I am using two flags to enable FP16 building and usage of DLA Cores:

where config is of type SampleUniquePtrnvinfer1::IBuilderConfig


IRuntime* infer = nvinfer1::createInferRuntime(sample::gLogger);
infer->setDLACore(a number);
std::shared_ptrnvinfer1::ICudaEngine Engine = std::shared_ptrnvinfer1::ICudaEngine(
infer->deserializeCudaEngine(trtModelStream.data(), size, nullptr), samplesCommon::InferDeleter());
return Engine;
where “a number” is set as 1, 8, 64


Question #1 I have seen that setFlag for precision is just for convenience and not a rule. Mixed precision + kSTRICT_TYPES, which type is chosen? - #5 by spolisetty has this issue been resolved in new releases?

Question #2 Due to TensorRT not caring what flag I use I get this warning:
onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
***************** CAN’T USE BOLD
DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer 3) [Shuffle] device type to GPU.*
*****************CAN’T USE BOLD
Since TRT doesn’t care my flags I cannot use DLA on specific layers. What is your comment and recomendation on this? (Note: I do not construct network by hand and I cannot for the love of me write a code to do that to just include layer.setPrecision(xxx), layer.setOutputType(xxx) so that the precision flags would work, so please do not recommend that)

Question #3 What does this warning mean, I’d be happy if you can enlighten me:
DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer Add_259

Question #3.5 What should be the “a number” in setDLACore flag? I get AGX has 64 tensorcores, but how many DLA does it have, should it be 64?

Thank you very much in advance,


TensorRT Version: 7.1.3 because version 8 caused me great troubles in the past month.
GPU Type: Nvidia AGX
Nvidia Driver Version: ?
CUDA Version: 10.2
Operating System + Version: 18.04

Please check the below links, as they might answer your concerns.

Hello @NVES

They answer Question 3.5 and mention the precision values for DLA cores in Question 2 but offer no solution to my question, the rest isn’t answered.

I found on AGX forum relating Question 3, @AastaLLL mentioned that after Jetpack 4.2.1 DLA cores will support 32 subgraphs. Why are they still supporting 8 subgraphs per DLA core in 4.5.1? DLA supports only 3 subgraphs per DLA core

I need help with Question 1 and 2 they constitute the core of my problem.



We are moving this post to AGX forum to get better help.

Thank you.


Please note that setDLACore() is to set the ID of used DLA hardware.
On Xavier, there are two DLA cores so the ID should be either 0 or 1.

Q1. This is not a bug.
When you set the half mode, TensorRT will inference the model in half mode.
But it is possible that some intermediate layers will use float mode for better performance.
Since some layer operation is not friendly to run in half manner.

Q2. Do you want to use DLA for a particular layer or the whole model?
Based on the log, the error indicates there are some issues with the Shuffle layer.
The log is misleading. The root error is that DLA doesn’t support the Shuffle layer:

Q3. This indicates the model complexity is over the DLA support range.
So it needs to deploy the remaining tasks on the GPU.

Q3.5 This is the DLA hardware ID, not the number of the DLA cores is used.



I can summarize my problem as such: I cannot convert any layers to FP16 due to kSTRICT_TYPES of FP16 flags having no effect except when specifically called for specific layers. Mixed precision + kSTRICT_TYPES, which type is chosen? - #7 by spolisetty
Since I cannot convert layers to FP16 I cannot use DLA on any layers, not just shuffle, but conv layers too. I try to use DLA on any possible layer, not specific layers. All of the attempted conversions to DLA falls back to GPU usage. (Though not all due to failure of FP16 conversion, but some due to this sort of errors DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer Add_259)

mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(builder->buildEngineWithConfig(*network, *config), InferDeleter());

Is there a documentation on which types of layers are friendly to run with half mode?


Have you tried your model with trtexec before?
If not, it’s recommended to do this to get the detailed support status:

$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --verbose --fp16 --useDLACore=0 --allowGPUFallback
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --verbose --fp16 --useDLACore=0 --allowGPUFallback
[01/20/2022-01:43:38] [I] [TRT] ---------- Layers Running on DLA ----------
[01/20/2022-01:43:38] [I] [TRT] [DlaLayer] {ForeignNode[Convolution28]}
[01/20/2022-01:43:38] [I] [TRT] [DlaLayer] {ForeignNode[ReLU32...Convolution110]}
[01/20/2022-01:43:38] [I] [TRT] [DlaLayer] {ForeignNode[ReLU114...Pooling160]}
[01/20/2022-01:43:38] [I] [TRT] [DlaLayer] {ForeignNode[Plus214]}
[01/20/2022-01:43:38] [I] [TRT] ---------- Layers Running on GPU ----------
[01/20/2022-01:43:38] [I] [TRT] [GpuLayer] Parameter193 + Times212_reshape1
[01/20/2022-01:43:38] [I] [TRT] [GpuLayer] Parameter194
[01/20/2022-01:43:38] [I] [TRT] [GpuLayer] Parameter6 + (Unnamed Layer* 4) [Shuffle] + Plus30
[01/20/2022-01:43:38] [I] [TRT] [GpuLayer] Parameter88 + (Unnamed Layer* 10) [Shuffle] + Plus112
[01/20/2022-01:43:38] [I] [TRT] [GpuLayer] Times212_reshape0
[01/20/2022-01:43:38] [I] [TRT] [GpuLayer] Times212
[01/20/2022-01:43:38] [I] [TRT] [GpuLayer] shuffle_Times212_Output_0
[01/20/2022-01:43:38] [I] [TRT] [GpuLayer] shuffle_(Unnamed Layer* 16) [Constant]_output


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.