Dear NVIDIA team,
I can’t run even simple, purposly build .onnx on DLA on TRT8. The same onnx works on TRT7.
Onnx:
reid-model.onnx (2.6 MB)
The command:
trtexec --onnx=reid-model.onnx --verbose --useDLACore=0
The output (failed one):
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # trtexec --onnx=reid-model.onnx --verbose --useDLACore=0
[12/22/2021-18:42:35] [I] === Model Options ===
[12/22/2021-18:42:35] [I] Format: ONNX
[12/22/2021-18:42:35] [I] Model: reid-model.onnx
[12/22/2021-18:42:35] [I] Output:
[12/22/2021-18:42:35] [I] === Build Options ===
[12/22/2021-18:42:35] [I] Max batch: explicit
[12/22/2021-18:42:35] [I] Workspace: 16 MiB
[12/22/2021-18:42:35] [I] minTiming: 1
[12/22/2021-18:42:35] [I] avgTiming: 8
[12/22/2021-18:42:35] [I] Precision: FP32
[12/22/2021-18:42:35] [I] Calibration:
[12/22/2021-18:42:35] [I] Refit: Disabled
[12/22/2021-18:42:35] [I] Sparsity: Disabled
[12/22/2021-18:42:35] [I] Safe mode: Disabled
[12/22/2021-18:42:35] [I] Restricted mode: Disabled
[12/22/2021-18:42:35] [I] Save engine:
[12/22/2021-18:42:35] [I] Load engine:
[12/22/2021-18:42:35] [I] NVTX verbosity: 0
[12/22/2021-18:42:35] [I] Tactic sources: Using default tactic sources
[12/22/2021-18:42:35] [I] timingCacheMode: local
[12/22/2021-18:42:35] [I] timingCacheFile:
[12/22/2021-18:42:35] [I] Input(s)s format: fp32:CHW
[12/22/2021-18:42:35] [I] Output(s)s format: fp32:CHW
[12/22/2021-18:42:35] [I] Input build shapes: model
[12/22/2021-18:42:35] [I] Input calibration shapes: model
[12/22/2021-18:42:35] [I] === System Options ===
[12/22/2021-18:42:35] [I] Device: 0
[12/22/2021-18:42:35] [I] DLACore: 0
[12/22/2021-18:42:35] [I] Plugins:
[12/22/2021-18:42:35] [I] === Inference Options ===
[12/22/2021-18:42:35] [I] Batch: Explicit
[12/22/2021-18:42:35] [I] Input inference shapes: model
[12/22/2021-18:42:35] [I] Iterations: 10
[12/22/2021-18:42:35] [I] Duration: 3s (+ 200ms warm up)
[12/22/2021-18:42:35] [I] Sleep time: 0ms
[12/22/2021-18:42:35] [I] Streams: 1
[12/22/2021-18:42:35] [I] ExposeDMA: Disabled
[12/22/2021-18:42:35] [I] Data transfers: Enabled
[12/22/2021-18:42:35] [I] Spin-wait: Disabled
[12/22/2021-18:42:35] [I] Multithreading: Disabled
[12/22/2021-18:42:35] [I] CUDA Graph: Disabled
[12/22/2021-18:42:35] [I] Separate profiling: Disabled
[12/22/2021-18:42:35] [I] Time Deserialize: Disabled
[12/22/2021-18:42:35] [I] Time Refit: Disabled
[12/22/2021-18:42:35] [I] Skip inference: Disabled
[12/22/2021-18:42:35] [I] Inputs:
[12/22/2021-18:42:35] [I] === Reporting Options ===
[12/22/2021-18:42:35] [I] Verbose: Enabled
[12/22/2021-18:42:35] [I] Averages: 10 inferences
[12/22/2021-18:42:35] [I] Percentile: 99
[12/22/2021-18:42:35] [I] Dump refittable layers:Disabled
[12/22/2021-18:42:35] [I] Dump output: Disabled
[12/22/2021-18:42:35] [I] Profile: Disabled
[12/22/2021-18:42:35] [I] Export timing to JSON file:
[12/22/2021-18:42:35] [I] Export output to JSON file:
[12/22/2021-18:42:35] [I] Export profile to JSON file:
[12/22/2021-18:42:35] [I]
[12/22/2021-18:42:35] [I] === Device Information ===
[12/22/2021-18:42:35] [I] Selected Device: Xavier
[12/22/2021-18:42:35] [I] Compute Capability: 7.2
[12/22/2021-18:42:35] [I] SMs: 8
[12/22/2021-18:42:35] [I] Compute Clock Rate: 1.377 GHz
[12/22/2021-18:42:35] [I] Device Global Memory: 31928 MiB
[12/22/2021-18:42:35] [I] Shared Memory per SM: 96 KiB
[12/22/2021-18:42:35] [I] Memory Bus Width: 256 bits (ECC disabled)
[12/22/2021-18:42:35] [I] Memory Clock Rate: 1.377 GHz
[12/22/2021-18:42:35] [I]
[12/22/2021-18:42:35] [I] TensorRT version: 8001
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::ScatterND version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::Proposal version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::Split version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[12/22/2021-18:42:35] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[12/22/2021-18:42:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 19256 (MiB)
[12/22/2021-18:42:36] [I] Start parsing network model
[12/22/2021-18:42:36] [I] [TRT] ----------------------------------------------------------------
[12/22/2021-18:42:36] [I] [TRT] Input filename: reid-model.onnx
[12/22/2021-18:42:36] [I] [TRT] ONNX IR version: 0.0.6
[12/22/2021-18:42:36] [I] [TRT] Opset version: 11
[12/22/2021-18:42:36] [I] [TRT] Producer name: tf2onnx
[12/22/2021-18:42:36] [I] [TRT] Producer version: 1.9.2
[12/22/2021-18:42:36] [I] [TRT] Domain:
[12/22/2021-18:42:36] [I] [TRT] Model version: 0
[12/22/2021-18:42:36] [I] [TRT] Doc string:
[12/22/2021-18:42:36] [I] [TRT] ----------------------------------------------------------------
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::GridAnchorRect_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::Clip_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::ScatterND version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::BatchedNMSDynamic_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::EfficientNMS_ONNX_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::EfficientNMS_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::Proposal version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::Split version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1
[12/22/2021-18:42:36] [V] [TRT] Adding network input: serving_default_input_1:0 with dtype: float32, dimensions: (-1, 1024)
[12/22/2021-18:42:36] [V] [TRT] Registering tensor: serving_default_input_1:0 for ONNX tensor: serving_default_input_1:0
[12/22/2021-18:42:36] [V] [TRT] Importing initializer: const_fold_opt__18
[12/22/2021-18:42:36] [V] [TRT] Importing initializer: const_fold_opt__17
[12/22/2021-18:42:36] [V] [TRT] Importing initializer: const_fold_opt__16
[12/22/2021-18:42:36] [V] [TRT] Importing initializer: const_fold_opt__15
[12/22/2021-18:42:36] [V] [TRT] Parsing node: model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd [MatMul]
[12/22/2021-18:42:36] [V] [TRT] Searching for input: serving_default_input_1:0
[12/22/2021-18:42:36] [V] [TRT] Searching for input: const_fold_opt__18
[12/22/2021-18:42:36] [V] [TRT] model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd [MatMul] inputs: [serving_default_input_1:0 -> (-1, 1024)[FLOAT]], [const_fold_opt__18 -> (1024, 512)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Registering layer: const_fold_opt__18 for ONNX node: const_fold_opt__18
[12/22/2021-18:42:36] [V] [TRT] GEMM: using FC layer instead of MM because all criteria were met.
[12/22/2021-18:42:36] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/22/2021-18:42:36] [V] [TRT] Original shape: (_, 1024), unsqueezing to: (_, _, _, _)
[12/22/2021-18:42:36] [W] [TRT] ShapedWeights.cpp:173: Weights const_fold_opt__18 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[12/22/2021-18:42:36] [V] [TRT] Registering layer: model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd for ONNX node: model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd
[12/22/2021-18:42:36] [V] [TRT] Original shape: (_, 512, 1, 1), squeezing to: (_, _)
[12/22/2021-18:42:36] [V] [TRT] Registering tensor: model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd for ONNX tensor: model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd
[12/22/2021-18:42:36] [V] [TRT] model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd [MatMul] outputs: [model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd -> (-1, 512)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Parsing node: Relu__5 [Relu]
[12/22/2021-18:42:36] [V] [TRT] Searching for input: model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd
[12/22/2021-18:42:36] [V] [TRT] Relu__5 [Relu] inputs: [model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd -> (-1, 512)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Registering layer: Relu__5 for ONNX node: Relu__5
[12/22/2021-18:42:36] [V] [TRT] Registering tensor: Relu__5:0 for ONNX tensor: Relu__5:0
[12/22/2021-18:42:36] [V] [TRT] Relu__5 [Relu] outputs: [Relu__5:0 -> (-1, 512)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Parsing node: model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1 [MatMul]
[12/22/2021-18:42:36] [V] [TRT] Searching for input: Relu__5:0
[12/22/2021-18:42:36] [V] [TRT] Searching for input: const_fold_opt__16
[12/22/2021-18:42:36] [V] [TRT] model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1 [MatMul] inputs: [Relu__5:0 -> (-1, 512)[FLOAT]], [const_fold_opt__16 -> (512, 256)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Registering layer: const_fold_opt__16 for ONNX node: const_fold_opt__16
[12/22/2021-18:42:36] [V] [TRT] GEMM: using FC layer instead of MM because all criteria were met.
[12/22/2021-18:42:36] [V] [TRT] Original shape: (_, 512), unsqueezing to: (_, _, _, _)
[12/22/2021-18:42:36] [W] [TRT] ShapedWeights.cpp:173: Weights const_fold_opt__16 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[12/22/2021-18:42:36] [V] [TRT] Registering layer: model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1 for ONNX node: model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1
[12/22/2021-18:42:36] [V] [TRT] Original shape: (_, 256, 1, 1), squeezing to: (_, _)
[12/22/2021-18:42:36] [V] [TRT] Registering tensor: model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1 for ONNX tensor: model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1
[12/22/2021-18:42:36] [V] [TRT] model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1 [MatMul] outputs: [model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1 -> (-1, 256)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Parsing node: Relu__8 [Relu]
[12/22/2021-18:42:36] [V] [TRT] Searching for input: model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1
[12/22/2021-18:42:36] [V] [TRT] Relu__8 [Relu] inputs: [model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1 -> (-1, 256)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Registering layer: Relu__8 for ONNX node: Relu__8
[12/22/2021-18:42:36] [V] [TRT] Registering tensor: Relu__8:0 for ONNX tensor: Relu__8:0
[12/22/2021-18:42:36] [V] [TRT] Relu__8 [Relu] outputs: [Relu__8:0 -> (-1, 256)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Parsing node: model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1 [MatMul]
[12/22/2021-18:42:36] [V] [TRT] Searching for input: Relu__8:0
[12/22/2021-18:42:36] [V] [TRT] Searching for input: const_fold_opt__15
[12/22/2021-18:42:36] [V] [TRT] model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1 [MatMul] inputs: [Relu__8:0 -> (-1, 256)[FLOAT]], [const_fold_opt__15 -> (256, 128)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Registering layer: const_fold_opt__15 for ONNX node: const_fold_opt__15
[12/22/2021-18:42:36] [V] [TRT] GEMM: using FC layer instead of MM because all criteria were met.
[12/22/2021-18:42:36] [V] [TRT] Original shape: (_, 256), unsqueezing to: (_, _, _, _)
[12/22/2021-18:42:36] [W] [TRT] ShapedWeights.cpp:173: Weights const_fold_opt__15 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[12/22/2021-18:42:36] [V] [TRT] Registering layer: model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1 for ONNX node: model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1
[12/22/2021-18:42:36] [V] [TRT] Original shape: (_, 128, 1, 1), squeezing to: (_, _)
[12/22/2021-18:42:36] [V] [TRT] Registering tensor: model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1 for ONNX tensor: model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1
[12/22/2021-18:42:36] [V] [TRT] model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1 [MatMul] outputs: [model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1 -> (-1, 128)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Parsing node: Relu__11 [Relu]
[12/22/2021-18:42:36] [V] [TRT] Searching for input: model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1
[12/22/2021-18:42:36] [V] [TRT] Relu__11 [Relu] inputs: [model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1 -> (-1, 128)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Registering layer: Relu__11 for ONNX node: Relu__11
[12/22/2021-18:42:36] [V] [TRT] Registering tensor: Relu__11:0 for ONNX tensor: Relu__11:0
[12/22/2021-18:42:36] [V] [TRT] Relu__11 [Relu] outputs: [Relu__11:0 -> (-1, 128)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Parsing node: StatefulPartitionedCall:0 [MatMul]
[12/22/2021-18:42:36] [V] [TRT] Searching for input: Relu__11:0
[12/22/2021-18:42:36] [V] [TRT] Searching for input: const_fold_opt__17
[12/22/2021-18:42:36] [V] [TRT] StatefulPartitionedCall:0 [MatMul] inputs: [Relu__11:0 -> (-1, 128)[FLOAT]], [const_fold_opt__17 -> (128, 32)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Registering layer: const_fold_opt__17 for ONNX node: const_fold_opt__17
[12/22/2021-18:42:36] [V] [TRT] GEMM: using FC layer instead of MM because all criteria were met.
[12/22/2021-18:42:36] [V] [TRT] Original shape: (_, 128), unsqueezing to: (_, _, _, _)
[12/22/2021-18:42:36] [W] [TRT] ShapedWeights.cpp:173: Weights const_fold_opt__17 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[12/22/2021-18:42:36] [V] [TRT] Registering layer: StatefulPartitionedCall:0 for ONNX node: StatefulPartitionedCall:0
[12/22/2021-18:42:36] [V] [TRT] Original shape: (_, 32, 1, 1), squeezing to: (_, _)
[12/22/2021-18:42:36] [V] [TRT] Registering tensor: StatefulPartitionedCall:0_0 for ONNX tensor: StatefulPartitionedCall:0
[12/22/2021-18:42:36] [V] [TRT] StatefulPartitionedCall:0 [MatMul] outputs: [StatefulPartitionedCall:0 -> (-1, 32)[FLOAT]],
[12/22/2021-18:42:36] [V] [TRT] Marking StatefulPartitionedCall:0_0 as output: StatefulPartitionedCall:0
[12/22/2021-18:42:36] [I] Finish parsing network model
[12/22/2021-18:42:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 377, GPU 19266 (MiB)
[12/22/2021-18:42:36] [W] Dynamic dimensions required for input: serving_default_input_1:0, but no shapes were provided. Automatically overriding shape to: 1x1024
[12/22/2021-18:42:36] [W] [TRT] (Unnamed Layer* 3) [Concatenation]: DLA only supports concatenation on the C dimension.
[12/22/2021-18:42:36] [W] [TRT] (Unnamed Layer* 16) [Concatenation]: DLA only supports concatenation on the C dimension.
[12/22/2021-18:42:36] [W] [TRT] (Unnamed Layer* 29) [Concatenation]: DLA only supports concatenation on the C dimension.
[12/22/2021-18:42:36] [W] [TRT] (Unnamed Layer* 42) [Concatenation]: DLA only supports concatenation on the C dimension.
[12/22/2021-18:42:36] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 377 MiB, GPU 19266 MiB
[12/22/2021-18:42:36] [V] [TRT] Applying generic optimizations to the graph for inference.
[12/22/2021-18:42:36] [V] [TRT] Original: 15 layers
[12/22/2021-18:42:36] [V] [TRT] After dead-layer removal: 15 layers
[12/22/2021-18:42:36] [V] [TRT] After Myelin optimization: 15 layers
[12/22/2021-18:42:36] [W] [TRT] Input tensor has less than 4 dimensions for Relu__5. At least one shuffle layer will be inserted which cannot run on DLA.
[12/22/2021-18:42:36] [W] [TRT] Input tensor has less than 4 dimensions for Relu__8. At least one shuffle layer will be inserted which cannot run on DLA.
[12/22/2021-18:42:36] [W] [TRT] Input tensor has less than 4 dimensions for Relu__11. At least one shuffle layer will be inserted which cannot run on DLA.
[12/22/2021-18:42:36] [V] [TRT] After DLA optimization: 21 layers
[12/22/2021-18:42:36] [V] [TRT] After scale fusion: 21 layers
[12/22/2021-18:42:36] [V] [TRT] ShuffleShuffleFusion: Fusing (Unnamed Layer* 11) [Shuffle] with shuffle_model_1/dense/MatMul;model_1/tf.nn.relu/Relu;model_1/dense/BiasAdd
[12/22/2021-18:42:36] [V] [TRT] ShuffleShuffleFusion: Fusing shuffle_Relu__5:0 with (Unnamed Layer* 19) [Shuffle]
[12/22/2021-18:42:36] [V] [TRT] ShuffleShuffleFusion: Fusing (Unnamed Layer* 24) [Shuffle] with shuffle_model_1/batch_normalization/batchnorm/mul_1;model_1/batch_normalization/batchnorm/add_1;model_1/dense_1/MatMul;model_1/tf.nn.relu_1/Relu;model_1/dense_1/BiasAdd1
[12/22/2021-18:42:36] [V] [TRT] ShuffleShuffleFusion: Fusing shuffle_Relu__8:0 with (Unnamed Layer* 32) [Shuffle]
[12/22/2021-18:42:36] [V] [TRT] ShuffleShuffleFusion: Fusing (Unnamed Layer* 37) [Shuffle] with shuffle_model_1/batch_normalization_1/batchnorm/mul_1;model_1/batch_normalization_1/batchnorm/add_1;model_1/dense_2/MatMul;model_1/tf.nn.relu_2/Relu;model_1/dense_2/BiasAdd1
[12/22/2021-18:42:36] [V] [TRT] ShuffleShuffleFusion: Fusing shuffle_Relu__11:0 with (Unnamed Layer* 45) [Shuffle]
[12/22/2021-18:42:36] [V] [TRT] After vertical fusions: 15 layers
[12/22/2021-18:42:36] [V] [TRT] After dupe layer removal: 15 layers
[12/22/2021-18:42:36] [V] [TRT] After final dead-layer removal: 15 layers
[12/22/2021-18:42:36] [V] [TRT] After tensor merging: 15 layers
[12/22/2021-18:42:36] [V] [TRT] After concat removal: 15 layers
[12/22/2021-18:42:36] [V] [TRT] Graph construction and optimization completed in 0.0194514 seconds.
[12/22/2021-18:42:36] [E] Error[9]: [standardEngineBuilder.cpp::isValidDLAConfig::2189] Error Code 9: Internal Error (Default DLA is enabled but layer (Unnamed Layer* 6) [Shuffle] is not supported on DLA and falling back to GPU is not enabled.)
[12/22/2021-18:42:36] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Segmentation fault (core dumped)
Thanks for help.