Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.1.6
Target Operating System
Custom Debian Linux
Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
SDK Manager Version
1.8.0.10363
Description:
Running the sample works with out utilizing DLA-s. Passing --useDLACore=0
does not work and will end with an error message:
NVMEDIA_DLA : 528, ERROR: load from memory failed.
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)
Environment
TensorRT Version: 5.1.5
NVIDIA GPU: NVIDIA Volta™-class integrated GPU
CUDA Version: 10.1
CUDNN Version: 7.5.1
Relevant Files
Samples are from NVIDIA-s official tarball (version 5.1) (https://developer.nvidia.com/nvidia-tensorrt-5x-download) with relevant data.
Steps To Reproduce
./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0
&&&& RUNNING TensorRT.sample_onnx_mnist # ./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0
[I] Building and running a GPU inference engine for Onnx MNIST
Input filename: /root/tensorrt_sample/mnist_data/mnist.onnx
ONNX IR version: 0.0.3
Opset version: 1
Producer name: CNTK
Producer version: 2.4
Domain:
Model version: 1
Doc string:[I] [TRT] Parameter193:Constant → (16, 4, 4, 10)
[I] [TRT] Parameter193_reshape1:Reshape → (256, 10)
[I] [TRT] Parameter6:Constant → (8)
[I] [TRT] Parameter5:Constant → (8, 1, 5, 5)
[I] [TRT] Convolution28_Output_0:Conv → (8, 28, 28)
[I] [TRT] Plus30_Output_0:Add → (8, 28, 28)
[I] [TRT] ReLU32_Output_0:Relu → (8, 28, 28)
[I] [TRT] Pooling66_Output_0:MaxPool → (8, 14, 14)
[I] [TRT] Parameter87:Constant → (16, 8, 5, 5)
[I] [TRT] Convolution110_Output_0:Conv → (16, 14, 14)
[I] [TRT] Parameter88:Constant → (16)
[I] [TRT] Plus112_Output_0:Add → (16, 14, 14)
[I] [TRT] ReLU114_Output_0:Relu → (16, 14, 14)
[I] [TRT] Pooling160_Output_0:MaxPool → (16, 4, 4)
[I] [TRT] Pooling160_Output_0_reshape0:Reshape → (256)
[I] [TRT] Times212_Output_0:MatMul → (10)
[I] [TRT] Parameter194:Constant → (1, 10)
[I] [TRT] Plus214_Output_0:Add → (10)
----- Parsing of ONNX model /root/tensorrt_sample/mnist_data/mnist.onnx is Done ----
[I] [TRT] Setting dynamic range for Input3 to [-127,127]
[I] [TRT] Setting dynamic range for Convolution28_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 1) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus30_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for ReLU32_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling66_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Convolution110_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 6) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus112_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for ReLU114_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling160_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for Pooling160_Output_0_reshape0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 11) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Times212_Output_0 to [-127,127]
[I] [TRT] Setting dynamic range for (Unnamed Layer* 13) [Constant]_output to [-127,127]
[I] [TRT] Setting dynamic range for Plus214_Output_0 to [-127,127]
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 1) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 6) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 10) [Shuffle] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 11) [Constant] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 12) [Matrix Multiply] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 13) [Constant] is not running on DLA, falling back to GPU.
[I] [TRT]
[I] [TRT] --------------- Layers running on DLA:
[I] [TRT] (Unnamed Layer* 0) [Convolution], (Unnamed Layer* 2) [ElementWise], (Unnamed Layer* 3) [Activation], (Unnamed Layer* 4) [Pooling], (Unnamed Layer* 5) [Convolution], (Unnamed Layer* 7) [ElementWise], (Unnamed Layer* 8) [Activation], (Unnamed Layer* 9) [Pooling], (Unnamed Layer* 14) [ElementWise],
[I] [TRT] --------------- Layers running on GPU:
[I] [TRT] (Unnamed Layer* 1) [Constant], (Unnamed Layer* 6) [Constant], (Unnamed Layer* 10) [Shuffle], (Unnamed Layer* 11) [Constant], (Unnamed Layer* 12) [Matrix Multiply], (Unnamed Layer* 13) [Constant],
[W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[I] [TRT] [INT8 Quantization] User overriding Scales: Input3 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Convolution28_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 1) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus30_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: ReLU32_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling66_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Convolution110_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 6) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus112_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: ReLU114_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling160_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Pooling160_Output_0_reshape0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 11) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Times212_Output_0 [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 13) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] User overriding Scales: Plus214_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Input3 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Convolution28_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 1) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus30_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: ReLU32_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling66_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Convolution110_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 6) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus112_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: ReLU114_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling160_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling160_Output_0_reshape0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 11) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Times212_Output_0 [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 13) [Constant]_output [1]
[I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus214_Output_0 [1]
[I] [TRT] Original: 15 layers
[I] [TRT] After dead-layer removal: 15 layers
[I] [TRT] After DLA optimization: 13 layers
[I] [TRT] After scale fusion: 13 layers
[I] [TRT] After vertical fusions: 13 layers
[I] [TRT] After swap: 13 layers
[I] [TRT] After final dead-layer removal: 13 layers
[I] [TRT] After tensor merging: 13 layers
[I] [TRT] After concat removal: 13 layers
[I] [TRT] Configuring builder for Int8 Mode completed in 0.0084503 seconds.
[I] [TRT] Graph construction and optimization completed in 0.00888536 seconds.
[W] [TRT] Warning: no implementation of (Unnamed Layer* 1) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006912
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[W] [TRT] Warning: no implementation of (Unnamed Layer* 6) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00544
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00736
[W] [TRT] Warning: no implementation of (Unnamed Layer* 11) [Constant] obeys the requested constraints, using a higher precision type
[W] [TRT] Warning: no implementation of (Unnamed Layer* 13) [Constant] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing Input3 to nvm(9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 0) [Convolution]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00736
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007168
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.011136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00688
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005248
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00512
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008832
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.009536
[I] [TRT] Tactic 2 time 0.01232
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.008896
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise
[I] [TRT] Tactic 1 time 0.009152
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007008
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0104
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007936
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008768
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0112
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007008
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 3) [Activation],(Unnamed Layer* 4) [Pooling],(Unnamed Layer* 5) [Convolution]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0072
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00864
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.00688
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005216
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008608
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007136
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007488
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005408
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009088
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005248
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.009216
[I] [TRT] Tactic 2 time 0.009184
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.008608
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise
[I] [TRT] Tactic 1 time 0.00864
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006944
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007264
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.007232
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009216
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.005344
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.008896
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.006976
[I] [TRT]
[I] [TRT] --------------- Timing {(Unnamed Layer* 8) [Activation],(Unnamed Layer* 9) [Pooling]}(31)
[I] [TRT] Tactic 548859524883 is the only option, timing skipped
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.009152
[I] [TRT]
[I] [TRT] --------------- Timing (9)
[I] [TRT] Tactic 0 time 0.0088
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 10) Shuffle
[I] [TRT] Tactic 0 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 10) [Shuffle] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 12) Matrix Multiply
[I] [TRT] Tactic 0 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 12) [Matrix Multiply] obeys the requested constraints, using a higher precision type
[I] [TRT]
[I] [TRT] --------------- Timing (Unnamed Layer* 14) ElementWise
[I] [TRT] Tactic 1 is the only option, timing skipped
[W] [TRT] Warning: no implementation of (Unnamed Layer* 14) [ElementWise] obeys the requested constraints, using a higher precision type
[I] [TRT] Adding reformat layer: (Unnamed Layer* 1) [Constant] output to be reformatted 0 ((Unnamed Layer* 1) [Constant]_output) from Int8(1,1,1:32,1) to Float(1,1,1,8)
[I] [TRT] Adding reformat layer: (Unnamed Layer* 6) [Constant] output to be reformatted 0 ((Unnamed Layer* 6) [Constant]_output) from Int8(1,1,1:32,1) to Float(1,1,1,16)
[I] [TRT] Adding reformat layer: (Unnamed Layer* 10) [Shuffle] reformatted input 0 (Pooling160_Output_0) from Int8(1,4,16:32,16) to Float(1,4,16,256)
[I] [TRT] Formats and tactics selection completed in 3.93059 seconds.
[I] [TRT] After reformat layers: 22 layers
[I] [TRT] Block size 16777216
[I] [TRT] Block size 25088
[I] [TRT] Block size 1024
[I] [TRT] Block size 512
[I] [TRT] Total Activation Memory: 16803840
[I] [TRT] Detected 1 input and 1 output network tensors.
NVMEDIA_DLA : 528, ERROR: load from memory failed.
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)
[E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)