NVMEDIA_DLA : 528, ERROR: load from memory failed

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.1.6

Target Operating System
Custom Debian Linux

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)

SDK Manager Version
1.8.0.10363

Description:

Running the sample works with out utilizing DLA-s. Passing --useDLACore=0 does not work and will end with an error message:

NVMEDIA_DLA : 528, ERROR: load from memory failed.
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)

Environment

TensorRT Version: 5.1.5
NVIDIA GPU: NVIDIA Volta™-class integrated GPU
CUDA Version: 10.1
CUDNN Version: 7.5.1

Relevant Files

Samples are from NVIDIA-s official tarball (version 5.1) (https://developer.nvidia.com/nvidia-tensorrt-5x-download) with relevant data.

Steps To Reproduce

./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0

&&&& RUNNING TensorRT.sample_onnx_mnist # ./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0

[I] Building and running a GPU inference engine for Onnx MNIST

Input filename: /root/tensorrt_sample/mnist_data/mnist.onnx

ONNX IR version: 0.0.3
Opset version: 1
Producer name: CNTK
Producer version: 2.4
Domain:
Model version: 1
Doc string:

[I] [TRT] Parameter193:Constant → (16, 4, 4, 10)
[I] [TRT] Parameter193_reshape1:Reshape → (256, 10)
[I] [TRT] Parameter6:Constant → (8)
[I] [TRT] Parameter5:Constant → (8, 1, 5, 5)
[I] [TRT] Convolution28_Output_0:Conv → (8, 28, 28)
[I] [TRT] Plus30_Output_0:Add → (8, 28, 28)
[I] [TRT] ReLU32_Output_0:Relu → (8, 28, 28)
[I] [TRT] Pooling66_Output_0:MaxPool → (8, 14, 14)
[I] [TRT] Parameter87:Constant → (16, 8, 5, 5)
[I] [TRT] Convolution110_Output_0:Conv → (16, 14, 14)
[I] [TRT] Parameter88:Constant → (16)
[I] [TRT] Plus112_Output_0:Add → (16, 14, 14)
[I] [TRT] ReLU114_Output_0:Relu → (16, 14, 14)
[I] [TRT] Pooling160_Output_0:MaxPool → (16, 4, 4)
[I] [TRT] Pooling160_Output_0_reshape0:Reshape → (256)
[I] [TRT] Times212_Output_0:MatMul → (10)
[I] [TRT] Parameter194:Constant → (1, 10)
[I] [TRT] Plus214_Output_0:Add → (10)
----- Parsing of ONNX model /root/tensorrt_sample/mnist_data/mnist.onnx is Done ----
[I] [TRT] Setting dynamic range for Input3 to [-127,127]
[I] [TRT] Setting dynamic range for Convolution28_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 1) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus30_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for ReLU32_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling66_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Convolution110_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 6) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus112_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for ReLU114_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling160_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling160_Output_0_reshape0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 11) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Times212_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 13) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus214_Output_0 to [-127,127]
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 1) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 6) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 10) [Shuffle] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 11) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 12) [Matrix Multiply] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 13) [Constant] is not running on DLA, falling back to GPU.
[I] [TRT]
[I] [TRT] --------------- Layers running on DLA:
[I] [TRT] (Unnamed Layer* 0) [Convolution], (Unnamed Layer* 2) [ElementWise], (Unnamed Layer* 3) [Activation], (Unnamed Layer* 4) [Pooling], (Unnamed Layer* 5) [Convolution], (Unnamed Layer* 7) [ElementWise], (Unnamed Layer* 8) [Activation], (Unnamed Layer* 9) [Pooling], (Unnamed Layer* 14) [ElementWise],
[I] [TRT] --------------- Layers running on GPU:
[I] [TRT] (Unnamed Layer* 1) [Constant], (Unnamed Layer* 6) [Constant], (Unnamed Layer* 10) [Shuffle], (Unnamed Layer* 11) [Constant], (Unnamed Layer* 12) [Matrix Multiply], (Unnamed Layer* 13) [Constant],
[W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[I] [TRT] [INT8 Quantization] User overriding Scales: Input3 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Convolution28_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 1) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus30_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: ReLU32_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling66_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Convolution110_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 6) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus112_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: ReLU114_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling160_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling160_Output_0_reshape0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 11) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Times212_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 13) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus214_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Input3 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Convolution28_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 1) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus30_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: ReLU32_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling66_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Convolution110_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 6) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus112_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: ReLU114_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling160_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling160_Output_0_reshape0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 11) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Times212_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 13) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus214_Output_0 [1]
[I] [TRT] Original: 15 layers
[I] [TRT] After dead-layer removal: 15 layers
[I] [TRT] After DLA optimization: 13 layers
[I] [TRT] After scale fusion: 13 layers
[I] [TRT] After vertical fusions: 13 layers
[I] [TRT] After swap: 13 layers
[I] [TRT] After final dead-layer removal: 13 layers
[I] [TRT] After tensor merging: 13 layers
[I] [TRT] After concat removal: 13 layers
[I] [TRT] Configuring builder for Int8 Mode completed in 0.0084503 seconds.
[I] [TRT] Graph construction and optimization completed in 0.00888536 seconds.
[W] [TRT] Warning: no implementation of (Unnamed Layer* 1) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006912
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[W] [TRT] Warning: no implementation of (Unnamed Layer* 6) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00544
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00736
[W] [TRT] Warning: no implementation of (Unnamed Layer* 11) [Constant] obeys the requested constraints, using a higher precision type
[W] [TRT] Warning: no implementation of (Unnamed Layer* 13) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing Input3 to nvm(9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 0) [Convolution]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00736
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007168
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.011136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00688
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005248
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00512
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.009536
[I] [TRT] Tactic 2 time 0.01232
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.008896
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.009152
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007008
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0104
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007936
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008768
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0112
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007008
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 3) [Activation],(Unnamed Layer* 4) [Pooling],(Unnamed Layer* 5) [Convolution]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0072
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00864
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00688
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005216
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008608
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007488
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005408
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009088
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005248
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.009216
[I] [TRT] Tactic 2 time 0.009184
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.008608
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.00864
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007264
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009216
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008896
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 8) [Activation],(Unnamed Layer* 9) [Pooling]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009152
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0088
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 10) Shuffle
[I] [TRT] Tactic 0 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 10) [Shuffle] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 12) Matrix Multiply
[I] [TRT] Tactic 0 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 12) [Matrix Multiply] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 14) ElementWise
[I] [TRT] Tactic 1 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 14) [ElementWise] obeys the requested constraints, using a higher precision type
[I] [TRT] Adding reformat layer: (Unnamed Layer* 1) [Constant] output to be reformatted 0 ((Unnamed Layer* 1) [Constant]_output) from Int8(1,1,1:32,1) to Float(1,1,1,8)
[I] [TRT] Adding reformat layer: (Unnamed Layer* 6) [Constant] output to be reformatted 0 ((Unnamed Layer* 6) [Constant]_output) from Int8(1,1,1:32,1) to Float(1,1,1,16)
[I] [TRT] Adding reformat layer: (Unnamed Layer* 10) [Shuffle] reformatted input 0 (Pooling160_Output_0) from Int8(1,4,16:32,16) to Float(1,4,16,256)
[I] [TRT] Formats and tactics selection completed in 3.93059 seconds.
[I] [TRT] After reformat layers: 22 layers
[I] [TRT] Block size 16777216
[I] [TRT] Block size 25088
[I] [TRT] Block size 1024
[I] [TRT] Block size 512
[I] [TRT] Total Activation Memory: 16803840
[I] [TRT] Detected 1 input and 1 output network tensors.
NVMEDIA_DLA : 528, ERROR: load from memory failed.
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)

Dear @martin.tammvee,
The package you have downloaded is different than shipped with DRIVE release. It could be possible that issue due to non compatible version of TRT on target. Could you use the TRT samples/libs that gets installed using sdkmanager. Please check/usr/src/tensorrt.

We have modified installations to fit with our custom Linux Debian. TRT seems to work - running without DLA works perfectly. Tried different samples - every sample fails with DLA.