NVMEDIA_DLA : 528, ERROR: load from memory failed

martin.tammvee · August 30, 2022, 7:52am

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.1.6

Target Operating System
Custom Debian Linux

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)

SDK Manager Version
1.8.0.10363

Description:

Running the sample works with out utilizing DLA-s. Passing --useDLACore=0 does not work and will end with an error message:

NVMEDIA_DLA : 528, ERROR: load from memory failed.
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)

Environment

TensorRT Version: 5.1.5
NVIDIA GPU: NVIDIA Volta™-class integrated GPU
CUDA Version: 10.1
CUDNN Version: 7.5.1

Relevant Files

Samples are from NVIDIA-s official tarball (version 5.1) (Log in | NVIDIA Developer) with relevant data.

Steps To Reproduce

./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0

&&&& RUNNING TensorRT.sample_onnx_mnist # ./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0

[I] Building and running a GPU inference engine for Onnx MNIST

Input filename: /root/tensorrt_sample/mnist_data/mnist.onnx

ONNX IR version: 0.0.3
Opset version: 1
Producer name: CNTK
Producer version: 2.4
Domain:
Model version: 1
Doc string:

[I] [TRT] Parameter193:Constant → (16, 4, 4, 10)
[I] [TRT] Parameter193_reshape1:Reshape → (256, 10)
[I] [TRT] Parameter6:Constant → (8)
[I] [TRT] Parameter5:Constant → (8, 1, 5, 5)
[I] [TRT] Convolution28_Output_0:Conv → (8, 28, 28)
[I] [TRT] Plus30_Output_0:Add → (8, 28, 28)
[I] [TRT] ReLU32_Output_0:Relu → (8, 28, 28)
[I] [TRT] Pooling66_Output_0:MaxPool → (8, 14, 14)
[I] [TRT] Parameter87:Constant → (16, 8, 5, 5)
[I] [TRT] Convolution110_Output_0:Conv → (16, 14, 14)
[I] [TRT] Parameter88:Constant → (16)
[I] [TRT] Plus112_Output_0:Add → (16, 14, 14)
[I] [TRT] ReLU114_Output_0:Relu → (16, 14, 14)
[I] [TRT] Pooling160_Output_0:MaxPool → (16, 4, 4)
[I] [TRT] Pooling160_Output_0_reshape0:Reshape → (256)
[I] [TRT] Times212_Output_0:MatMul → (10)
[I] [TRT] Parameter194:Constant → (1, 10)
[I] [TRT] Plus214_Output_0:Add → (10)
----- Parsing of ONNX model /root/tensorrt_sample/mnist_data/mnist.onnx is Done ----
[I] [TRT] Setting dynamic range for Input3 to [-127,127]
[I] [TRT] Setting dynamic range for Convolution28_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 1) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus30_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for ReLU32_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling66_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Convolution110_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 6) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus112_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for ReLU114_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling160_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling160_Output_0_reshape0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 11) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Times212_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 13) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus214_Output_0 to [-127,127]
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 1) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 6) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 10) [Shuffle] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 11) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 12) [Matrix Multiply] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 13) [Constant] is not running on DLA, falling back to GPU.
[I] [TRT]
[I] [TRT] --------------- Layers running on DLA:
[I] [TRT] (Unnamed Layer* 0) [Convolution], (Unnamed Layer* 2) [ElementWise], (Unnamed Layer* 3) [Activation], (Unnamed Layer* 4) [Pooling], (Unnamed Layer* 5) [Convolution], (Unnamed Layer* 7) [ElementWise], (Unnamed Layer* 8) [Activation], (Unnamed Layer* 9) [Pooling], (Unnamed Layer* 14) [ElementWise],
[I] [TRT] --------------- Layers running on GPU:
[I] [TRT] (Unnamed Layer* 1) [Constant], (Unnamed Layer* 6) [Constant], (Unnamed Layer* 10) [Shuffle], (Unnamed Layer* 11) [Constant], (Unnamed Layer* 12) [Matrix Multiply], (Unnamed Layer* 13) [Constant],
[W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[I] [TRT] [INT8 Quantization] User overriding Scales: Input3 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Convolution28_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 1) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus30_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: ReLU32_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling66_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Convolution110_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 6) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus112_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: ReLU114_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling160_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling160_Output_0_reshape0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 11) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Times212_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 13) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus214_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Input3 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Convolution28_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 1) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus30_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: ReLU32_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling66_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Convolution110_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 6) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus112_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: ReLU114_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling160_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling160_Output_0_reshape0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 11) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Times212_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 13) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus214_Output_0 [1]
[I] [TRT] Original: 15 layers
[I] [TRT] After dead-layer removal: 15 layers
[I] [TRT] After DLA optimization: 13 layers
[I] [TRT] After scale fusion: 13 layers
[I] [TRT] After vertical fusions: 13 layers
[I] [TRT] After swap: 13 layers
[I] [TRT] After final dead-layer removal: 13 layers
[I] [TRT] After tensor merging: 13 layers
[I] [TRT] After concat removal: 13 layers
[I] [TRT] Configuring builder for Int8 Mode completed in 0.0084503 seconds.
[I] [TRT] Graph construction and optimization completed in 0.00888536 seconds.
[W] [TRT] Warning: no implementation of (Unnamed Layer* 1) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006912
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[W] [TRT] Warning: no implementation of (Unnamed Layer* 6) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00544
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00736
[W] [TRT] Warning: no implementation of (Unnamed Layer* 11) [Constant] obeys the requested constraints, using a higher precision type
[W] [TRT] Warning: no implementation of (Unnamed Layer* 13) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing Input3 to nvm(9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 0) [Convolution]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00736
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007168
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.011136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00688
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005248
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00512
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.009536
[I] [TRT] Tactic 2 time 0.01232
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.008896
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.009152
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007008
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0104
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007936
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008768
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0112
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007008
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 3) [Activation],(Unnamed Layer* 4) [Pooling],(Unnamed Layer* 5) [Convolution]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0072
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00864
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00688
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005216
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008608
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007488
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005408
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009088
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005248
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.009216
[I] [TRT] Tactic 2 time 0.009184
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.008608
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.00864
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007264
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009216
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008896
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 8) [Activation],(Unnamed Layer* 9) [Pooling]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009152
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0088
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 10) Shuffle
[I] [TRT] Tactic 0 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 10) [Shuffle] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 12) Matrix Multiply
[I] [TRT] Tactic 0 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 12) [Matrix Multiply] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 14) ElementWise
[I] [TRT] Tactic 1 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 14) [ElementWise] obeys the requested constraints, using a higher precision type
[I] [TRT] Adding reformat layer: (Unnamed Layer* 1) [Constant] output to be reformatted 0 ((Unnamed Layer* 1) [Constant]_output) from Int8(1,1,1:32,1) to Float(1,1,1,8)
[I] [TRT] Adding reformat layer: (Unnamed Layer* 6) [Constant] output to be reformatted 0 ((Unnamed Layer* 6) [Constant]_output) from Int8(1,1,1:32,1) to Float(1,1,1,16)
[I] [TRT] Adding reformat layer: (Unnamed Layer* 10) [Shuffle] reformatted input 0 (Pooling160_Output_0) from Int8(1,4,16:32,16) to Float(1,4,16,256)
[I] [TRT] Formats and tactics selection completed in 3.93059 seconds.
[I] [TRT] After reformat layers: 22 layers
[I] [TRT] Block size 16777216
[I] [TRT] Block size 25088
[I] [TRT] Block size 1024
[I] [TRT] Block size 512
[I] [TRT] Total Activation Memory: 16803840
[I] [TRT] Detected 1 input and 1 output network tensors.
NVMEDIA_DLA : 528, ERROR: load from memory failed.
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)

SivaRamaKrishnaNV · August 30, 2022, 9:14am

Dear @martin.tammvee,
The package you have downloaded is different than shipped with DRIVE release. It could be possible that issue due to non compatible version of TRT on target. Could you use the TRT samples/libs that gets installed using sdkmanager. Please check/usr/src/tensorrt.

martin.tammvee · August 31, 2022, 6:48am

We have modified installations to fit with our custom Linux Debian. TRT seems to work - running without DLA works perfectly. Tried different samples - every sample fails with DLA.

Topic		Replies	Views
nvm_dlaSample got failed DRIVE AGX Orin General driveos-nvmedia	9	125	September 4, 2024
Cuda Memory Error when enabling the DLA Jetson AGX Xavier	17	2299	August 5, 2019
TensorRT run DLA on Xavier Jetson AGX Xavier nvbugs	11	1752	October 18, 2021
use DLA fail in half mode on tensorrt_version_5_0_6_3 TensorRT	2	570	January 31, 2021
Using dla on orin nx meet an error Jetson AGX Xavier dla	9	292	September 8, 2024
Trtexec log problem and use DLA error on Jetson Xavier Jetson AGX Xavier dla	7	1672	October 18, 2021
Tensorrt Python API has a bug in DLA usage Jetson AGX Xavier tensorrt	11	751	August 17, 2022
Cannot create DLA engine using trtexec Jetson Xavier NX tensorrt	2	1617	October 18, 2021
Can't build engines with the DLA + INT8 in Jetson Xavier NX Jetson Xavier NX tensorrt , dla	13	1424	October 18, 2021
DLA_STANDALONE error in forceToUseNvmIO Jetson AGX Xavier dla	15	1386	February 9, 2023

NVMEDIA_DLA : 528, ERROR: load from memory failed

Description:

Environment

Relevant Files

Steps To Reproduce

&&&& RUNNING TensorRT.sample_onnx_mnist # ./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0

Input filename: /root/tensorrt_sample/mnist_data/mnist.onnx

Related topics