Can't build engines with the DLA + INT8 in Jetson Xavier NX

HelloNewJAPAN · November 12, 2020, 8:04am

Hi.
My English isn’t so good so feel free to ask me if there is anything unclear.

I can’t build an engine with trtexec in Jetson Xavier NX with a combination of DLA and INT8.
The following error message is printed and the program exits.

…/builder/tacticOptimizer.cpp (2626) - Assertion Error in exemplar: 0 (pitch == 1 && “requirements impossible to satisfy”)

・Device : Jetson Xavier NX (Dev kit)
・It works well in FP16 + DLA.
・It works well in INT8 + GPU.

Thank you in advance.

Regards,

AastaLLL · November 13, 2020, 2:46am

Hi,

We need more information for the error.
Could you run the trtexec with --verbose flag and share the log with us?
(Just the INT8+DLA case is enough)

Thanks.

HelloNewJAPAN · November 13, 2020, 5:17am

Hi.
I really apprecaite your reply.

Here is the log of the execution with the --verbose flag.

As the log would be very long in the actual model, we use a different model.
In this case, the model you are executing is the first part of the actual model.
(The error message is the same as that of the actual model.)

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/work/slash.onnx --workspace=2048 --useDLACore=0 --allowGPUFallback --int8 --verbose
[11/13/2020-14:00:08] [I] === Model Options ===
[11/13/2020-14:00:08] [I] Format: ONNX
[11/13/2020-14:00:08] [I] Model: /home/jetson/work/slash.onnx
[11/13/2020-14:00:08] [I] Output:
[11/13/2020-14:00:08] [I] === Build Options ===
[11/13/2020-14:00:08] [I] Max batch: 1
[11/13/2020-14:00:08] [I] Workspace: 2048 MB
[11/13/2020-14:00:08] [I] minTiming: 1
[11/13/2020-14:00:08] [I] avgTiming: 8
[11/13/2020-14:00:08] [I] Precision: FP32+INT8
[11/13/2020-14:00:08] [I] Calibration: Dynamic
[11/13/2020-14:00:08] [I] Safe mode: Disabled
[11/13/2020-14:00:08] [I] Save engine: 
[11/13/2020-14:00:08] [I] Load engine: 
[11/13/2020-14:00:08] [I] Builder Cache: Enabled
[11/13/2020-14:00:08] [I] NVTX verbosity: 0
[11/13/2020-14:00:08] [I] Inputs format: fp32:CHW
[11/13/2020-14:00:08] [I] Outputs format: fp32:CHW
[11/13/2020-14:00:08] [I] Input build shapes: model
[11/13/2020-14:00:08] [I] Input calibration shapes: model
[11/13/2020-14:00:08] [I] === System Options ===
[11/13/2020-14:00:08] [I] Device: 0
[11/13/2020-14:00:08] [I] DLACore: 0(With GPU fallback)
[11/13/2020-14:00:08] [I] Plugins:
[11/13/2020-14:00:08] [I] === Inference Options ===
[11/13/2020-14:00:08] [I] Batch: 1
[11/13/2020-14:00:08] [I] Input inference shapes: model
[11/13/2020-14:00:08] [I] Iterations: 10
[11/13/2020-14:00:08] [I] Duration: 3s (+ 200ms warm up)
[11/13/2020-14:00:08] [I] Sleep time: 0ms
[11/13/2020-14:00:08] [I] Streams: 1
[11/13/2020-14:00:08] [I] ExposeDMA: Disabled
[11/13/2020-14:00:08] [I] Spin-wait: Disabled
[11/13/2020-14:00:08] [I] Multithreading: Disabled
[11/13/2020-14:00:08] [I] CUDA Graph: Disabled
[11/13/2020-14:00:08] [I] Skip inference: Disabled
[11/13/2020-14:00:08] [I] Inputs:
[11/13/2020-14:00:08] [I] === Reporting Options ===
[11/13/2020-14:00:08] [I] Verbose: Enabled
[11/13/2020-14:00:08] [I] Averages: 10 inferences
[11/13/2020-14:00:08] [I] Percentile: 99
[11/13/2020-14:00:08] [I] Dump output: Disabled
[11/13/2020-14:00:08] [I] Profile: Disabled
[11/13/2020-14:00:08] [I] Export timing to JSON file: 
[11/13/2020-14:00:08] [I] Export output to JSON file: 
[11/13/2020-14:00:08] [I] Export profile to JSON file: 
[11/13/2020-14:00:08] [I] 
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::Proposal version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::Split version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[11/13/2020-14:00:08] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
----------------------------------------------------------------
Input filename:   /home/jetson/work/slash.onnx
ONNX IR version:  0.0.7
Opset version:    12
Producer name:    tf2onnx
Producer version: 1.7.0
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::Clip_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::Proposal version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::Split version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:202: Adding network input: input:0 with dtype: float32, dimensions: (1, 32, 512, 1)
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:116: Registering tensor: input:0 for ONNX tensor: input:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:90: Importing initializer: new_shape__9
[11/13/2020-14:00:09] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D_weights_fused_bn
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D_bias_fused_bn
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5 [Reshape]
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: input:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: new_shape__9
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5 [Reshape] inputs: [input:0 -> (1, 32, 512, 1)], [new_shape__9 -> (4)], 
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5 for ONNX node: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:116: Registering tensor: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5:0 for ONNX tensor: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5 [Reshape] outputs: [StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5:0 -> (1, 1, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D [Conv]
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D_weights_fused_bn
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D_bias_fused_bn
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D [Conv] inputs: [StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5:0 -> (1, 1, 32, 512)], [StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D_weights_fused_bn -> (32, 1, 3, 3)], [StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D_bias_fused_bn -> (32)], 
[11/13/2020-14:00:09] [V] [TRT] builtin_op_importers.cpp:450: Convolution input dimensions: (1, 1, 32, 512)
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D for ONNX node: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D
[11/13/2020-14:00:09] [V] [TRT] builtin_op_importers.cpp:533: Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 32
[11/13/2020-14:00:09] [V] [TRT] builtin_op_importers.cpp:534: Convolution output dimensions: (1, 32, 32, 512)
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:116: Registering tensor: StatefulPartitionedCall/default/ResNet_bn0_1/FusedBatchNormV3:0 for ONNX tensor: StatefulPartitionedCall/default/ResNet_bn0_1/FusedBatchNormV3:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D [Conv] outputs: [StatefulPartitionedCall/default/ResNet_bn0_1/FusedBatchNormV3:0 -> (1, 32, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/default/ResNet_relu0_1/Relu [Relu]
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/default/ResNet_bn0_1/FusedBatchNormV3:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/default/ResNet_relu0_1/Relu [Relu] inputs: [StatefulPartitionedCall/default/ResNet_bn0_1/FusedBatchNormV3:0 -> (1, 32, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/default/ResNet_relu0_1/Relu for ONNX node: StatefulPartitionedCall/default/ResNet_relu0_1/Relu
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:116: Registering tensor: StatefulPartitionedCall/default/ResNet_relu0_1/Relu:0 for ONNX tensor: StatefulPartitionedCall/default/ResNet_relu0_1/Relu:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/default/ResNet_relu0_1/Relu [Relu] outputs: [StatefulPartitionedCall/default/ResNet_relu0_1/Relu:0 -> (1, 32, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/default/softmax/transpose [Transpose]
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/default/ResNet_relu0_1/Relu:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/default/softmax/transpose [Transpose] inputs: [StatefulPartitionedCall/default/ResNet_relu0_1/Relu:0 -> (1, 32, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/default/softmax/transpose for ONNX node: StatefulPartitionedCall/default/softmax/transpose
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:116: Registering tensor: StatefulPartitionedCall/default/softmax/transpose:0 for ONNX tensor: StatefulPartitionedCall/default/softmax/transpose:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/default/softmax/transpose [Transpose] outputs: [StatefulPartitionedCall/default/softmax/transpose:0 -> (1, 32, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/default/softmax/Softmax [Softmax]
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/default/softmax/transpose:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/default/softmax/Softmax [Softmax] inputs: [StatefulPartitionedCall/default/softmax/transpose:0 -> (1, 32, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/default/softmax/Softmax for ONNX node: StatefulPartitionedCall/default/softmax/Softmax
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:116: Registering tensor: StatefulPartitionedCall/default/softmax/Softmax:0 for ONNX tensor: StatefulPartitionedCall/default/softmax/Softmax:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/default/softmax/Softmax [Softmax] outputs: [StatefulPartitionedCall/default/softmax/Softmax:0 -> (1, 32, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/default/softmax/transpose_1 [Transpose]
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/default/softmax/Softmax:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/default/softmax/transpose_1 [Transpose] inputs: [StatefulPartitionedCall/default/softmax/Softmax:0 -> (1, 32, 32, 512)], 
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/default/softmax/transpose_1 for ONNX node: StatefulPartitionedCall/default/softmax/transpose_1
[11/13/2020-14:00:09] [V] [TRT] ImporterContext.hpp:116: Registering tensor: Identity:0_1 for ONNX tensor: Identity:0
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/default/softmax/transpose_1 [Transpose] outputs: [Identity:0 -> (1, 32, 512, 32)], 
[11/13/2020-14:00:09] [V] [TRT] ModelImporter.cpp:507: Marking Identity:0_1 as output: Identity:0
 ----- Parsing of ONNX model /home/jetson/work/slash.onnx is Done ---- 
[11/13/2020-14:00:09] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for input:0 to [-2,2]
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5:0 to [-4,4]
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for StatefulPartitionedCall/default/ResNet_bn0_1/FusedBatchNormV3:0 to [-4,4]
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for StatefulPartitionedCall/default/ResNet_relu0_1/Relu:0 to [-4,4]
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for StatefulPartitionedCall/default/softmax/transpose:0 to [-4,4]
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for (Unnamed Layer* 4) [Shuffle]_output to [-4,4]
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for (Unnamed Layer* 5) [Softmax]_output to [-4,4]
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for StatefulPartitionedCall/default/softmax/Softmax:0 to [-4,4]
[11/13/2020-14:00:09] [V] [TRT] Setting dynamic range for Identity:0 to [-4,4]
[11/13/2020-14:00:09] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5 is not supported on DLA, falling back to GPU.
[11/13/2020-14:00:09] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/default/softmax/transpose is not supported on DLA, falling back to GPU.
[11/13/2020-14:00:09] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 4) [Shuffle] is not supported on DLA, falling back to GPU.
[11/13/2020-14:00:09] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/default/softmax/Softmax is not supported on DLA, falling back to GPU.
[11/13/2020-14:00:09] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 6) [Shuffle] is not supported on DLA, falling back to GPU.
[11/13/2020-14:00:09] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/default/softmax/transpose_1 is not supported on DLA, falling back to GPU.
[11/13/2020-14:00:09] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-2,2]
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-4,4]
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-4,4]
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-4,4]
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-4,4]
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-4,4]
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-4,4]
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-4,4]
[11/13/2020-14:00:09] [V] [TRT] User overriding scale with dynamic range [-4,4]
[11/13/2020-14:00:09] [V] [TRT] Original: 8 layers
[11/13/2020-14:00:09] [V] [TRT] After dead-layer removal: 8 layers
[11/13/2020-14:00:09] [V] [TRT] After DLA optimization: 7 layers
[11/13/2020-14:00:09] [V] [TRT] Fusing StatefulPartitionedCall/default/softmax/transpose with (Unnamed Layer* 4) [Shuffle]
[11/13/2020-14:00:09] [V] [TRT] Fusing (Unnamed Layer* 6) [Shuffle] with StatefulPartitionedCall/default/softmax/transpose_1
[11/13/2020-14:00:09] [V] [TRT] After Myelin optimization: 5 layers
[11/13/2020-14:00:09] [V] [TRT] After scale fusion: 5 layers
[11/13/2020-14:00:09] [V] [TRT] After vertical fusions: 5 layers
[11/13/2020-14:00:09] [V] [TRT] After final dead-layer removal: 5 layers
[11/13/2020-14:00:09] [V] [TRT] After tensor merging: 5 layers
[11/13/2020-14:00:09] [V] [TRT] After concat removal: 5 layers
[11/13/2020-14:00:09] [V] [TRT] Configuring builder for Int8 Mode completed in 0.0101641 seconds.
[11/13/2020-14:00:09] [V] [TRT] Graph construction and optimization completed in 0.0113474 seconds.
[11/13/2020-14:00:09] [I] [TRT] 
[11/13/2020-14:00:09] [I] [TRT] --------------- Layers running on DLA: 
[11/13/2020-14:00:09] [I] [TRT] {StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D,StatefulPartitionedCall/default/ResNet_relu0_1/Relu}, 
[11/13/2020-14:00:09] [I] [TRT] --------------- Layers running on GPU: 
[11/13/2020-14:00:09] [I] [TRT] StatefulPartitionedCall/default/ResNet_conv0_1/Conv2D__5, StatefulPartitionedCall/default/softmax/transpose + (Unnamed Layer* 4) [Shuffle], StatefulPartitionedCall/default/softmax/Softmax, (Unnamed Layer* 6) [Shuffle] + StatefulPartitionedCall/default/softmax/transpose_1, 
[11/13/2020-14:00:13] [V] [TRT] Constructing optimization profile number 0 [1/1].
[11/13/2020-14:00:13] [V] [TRT] Builder timing cache: created 0 entries, 0 hit(s)
[11/13/2020-14:00:13] [E] [TRT] ../builder/tacticOptimizer.cpp (2626) - Assertion Error in exemplar: 0 (pitch == 1 && "requirements impossible to satisfy")
[11/13/2020-14:00:13] [E] Engine creation failed
[11/13/2020-14:00:13] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/work/slash.onnx --workspace=2048 --useDLACore=0 --allowGPUFallback --int8 --verbose

Best regards,

AastaLLL · November 17, 2020, 5:35am

Hi,

Could you share the model with us?

Based on the log, this error is caused by certain layer cannot be supported due to special setting (ex. stride, padding, …)
However, a non-supported layer should fallback to GPU but somehow it turns out to be an exception.

If you can share the model with us, we can investigate more for the error.

Thanks.

HelloNewJAPAN · November 17, 2020, 6:26am

Hi.
I really appreciate your reply.

I understand.
I’ll share the ONNX model for testing.
This is the model we were using when we pasted the log.

Best regards,
trt_test.zip (1.9 KB)

AastaLLL · November 19, 2020, 9:34am

Hi,

Thanks for the model.

This issue can be reproduced in our environment and is under investigation.
We will update here for any process.

HelloNewJAPAN · November 20, 2020, 12:08am

Hi,

Thank you for your continuous support.
Let me know if you have anything you want to ask.

AastaLLL · November 30, 2020, 6:51am

Hi,

Thanks for your patience.

This issue is fixed in our internal TensorRT branch.
Will let you know when it releases.

Thanks

HelloNewJAPAN · November 30, 2020, 7:25am

Hi,

Thank you for cooperating so kindly.

Best regards,

eyalhir74 · December 2, 2020, 2:23pm

Hi @AastaLLL,
I’m seeing the same error and same behaviour (FP16 on DLA ok, INT8 on DLA crashes with trtexec).
Is there a way to download a fix for this issue somehow?

thanks
Eyal

AastaLLL · December 10, 2020, 2:32am

Hi,

Sorry that you will need to wait for our next TensorRT release for the fix.
I will update here once a new JetPack/package is available.

Thanks.

fpsychosis · January 28, 2021, 10:32am

Can you you confirm this issue is fixed in JP4.5?

HelloNewJAPAN · January 28, 2021, 11:55pm

I haven’t tried Jetpack 4.5, but I found this error in Jetpack 4.4.1.

The TensorRT versions of 4.5 and 4.4.1 are the same, 7.1.3, so I assume they haven’t changed.

Topic		Replies	Views
Engine creation fails when using DLA with GPU fallback Jetson AGX Xavier tensorrt , dla	11	2256	March 22, 2022
Unable to build tensorrt engine with DLA enabled on Jetson Xavier NX Jetson Xavier NX tensorrt , cudnn	7	464	May 15, 2024
Trtexec log problem and use DLA error on Jetson Xavier Jetson AGX Xavier dla	7	1688	October 18, 2021
Missing dynamic range for tensor output_bbox/BiasAdd. DLA requires all tensors dynamic range to be known DeepStream SDK	7	956	November 16, 2021
Assertion Error while building TensorRT engine for DLA Jetson Xavier NX tensorrt	2	583	October 18, 2021
Cannot create DLA engine using trtexec Jetson Xavier NX tensorrt	2	1628	October 18, 2021
TensorRT run DLA on Xavier Jetson AGX Xavier nvbugs	11	1764	October 18, 2021
DLA+INT8 compiled engine doesn't produce meaningful results Jetson Orin NX tensorrt , dla , jetson , deepstream	14	461	January 20, 2025
Using dla on orin nx meet an error Jetson AGX Xavier dla	9	328	September 8, 2024
Using trtexec fails to convert onnx to tensorrt engine (DLAcore) FP16, but int8 works Jetson Xavier NX dla	7	1462	August 10, 2022

Can't build engines with the DLA + INT8 in Jetson Xavier NX

Related topics