I get Internal DLA error and it runs on GPU FallBack

HelloNewJAPAN · December 9, 2020, 5:38am

Hi,
My English isn’t so good so feel free to ask me if there is anything unclear.

Using TensorRT (trtexec) in a [Jetson Xavier NX + FP16 + DLA] environment
When I try to run the Convolution layer, I get Internal DLA error and it runs on GPU FallBack.

For this problem, I found that the input size W is related.

For testing, I built an ONNX model with only one Convolution layer.

The configuration of the Convolution layer is as follows.
Padding 1, Stride 1, kernel 3x3, output channels 512

If you set the Input Shape to C512, H256, W117, or some other large value for H, the trtexec log is as follows, and it is running on DLA.

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/temp/dla_1.onnx --workspace=4096 --explicitBatch --fp16 --useDLACore=0 --allowGPUFallback --verbose
[12/09/2020-14:12:03] [I] === Model Options ===
[12/09/2020-14:12:03] [I] Format: ONNX
[12/09/2020-14:12:03] [I] Model: /home/jetson/temp/dla_1.onnx
[12/09/2020-14:12:03] [I] Output:
[12/09/2020-14:12:03] [I] === Build Options ===
[12/09/2020-14:12:03] [I] Max batch: explicit
[12/09/2020-14:12:03] [I] Workspace: 4096 MB
[12/09/2020-14:12:03] [I] minTiming: 1
[12/09/2020-14:12:03] [I] avgTiming: 8
[12/09/2020-14:12:03] [I] Precision: FP32+FP16
[12/09/2020-14:12:03] [I] Calibration: 
[12/09/2020-14:12:03] [I] Safe mode: Disabled
[12/09/2020-14:12:03] [I] Save engine: 
[12/09/2020-14:12:03] [I] Load engine: 
[12/09/2020-14:12:03] [I] Builder Cache: Enabled
[12/09/2020-14:12:03] [I] NVTX verbosity: 0
[12/09/2020-14:12:03] [I] Inputs format: fp32:CHW
[12/09/2020-14:12:03] [I] Outputs format: fp32:CHW
[12/09/2020-14:12:03] [I] Input build shapes: model
[12/09/2020-14:12:03] [I] Input calibration shapes: model
[12/09/2020-14:12:03] [I] === System Options ===
[12/09/2020-14:12:03] [I] Device: 0
[12/09/2020-14:12:03] [I] DLACore: 0(With GPU fallback)
[12/09/2020-14:12:03] [I] Plugins:
[12/09/2020-14:12:03] [I] === Inference Options ===
[12/09/2020-14:12:03] [I] Batch: Explicit
[12/09/2020-14:12:03] [I] Input inference shapes: model
[12/09/2020-14:12:03] [I] Iterations: 10
[12/09/2020-14:12:03] [I] Duration: 3s (+ 200ms warm up)
[12/09/2020-14:12:03] [I] Sleep time: 0ms
[12/09/2020-14:12:03] [I] Streams: 1
[12/09/2020-14:12:03] [I] ExposeDMA: Disabled
[12/09/2020-14:12:03] [I] Spin-wait: Disabled
[12/09/2020-14:12:03] [I] Multithreading: Disabled
[12/09/2020-14:12:03] [I] CUDA Graph: Disabled
[12/09/2020-14:12:03] [I] Skip inference: Disabled
[12/09/2020-14:12:03] [I] Inputs:
[12/09/2020-14:12:03] [I] === Reporting Options ===
[12/09/2020-14:12:03] [I] Verbose: Enabled
[12/09/2020-14:12:03] [I] Averages: 10 inferences
[12/09/2020-14:12:03] [I] Percentile: 99
[12/09/2020-14:12:03] [I] Dump output: Disabled
[12/09/2020-14:12:03] [I] Profile: Disabled
[12/09/2020-14:12:03] [I] Export timing to JSON file: 
[12/09/2020-14:12:03] [I] Export output to JSON file: 
[12/09/2020-14:12:03] [I] Export profile to JSON file: 
[12/09/2020-14:12:03] [I] 
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::Proposal version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::Split version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[12/09/2020-14:12:03] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
----------------------------------------------------------------
Input filename:   /home/jetson/temp/dla_1.onnx
ONNX IR version:  0.0.7
Opset version:    12
Producer name:    tf2onnx
Producer version: 1.7.0
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::Clip_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::Proposal version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::Split version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1
[12/09/2020-14:12:05] [V] [TRT] ModelImporter.cpp:202: Adding network input: input:0 with dtype: float32, dimensions: (1, 512, 256, 117)
[12/09/2020-14:12:05] [V] [TRT] ImporterContext.hpp:116: Registering tensor: input:0 for ONNX tensor: input:0
[12/09/2020-14:12:05] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/test/test_conv1/Conv2D/ReadVariableOp:0
[12/09/2020-14:12:05] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/test/test_conv1/Conv2D [Conv]
[12/09/2020-14:12:05] [V] [TRT] ModelImporter.cpp:119: Searching for input: input:0
[12/09/2020-14:12:05] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/test/test_conv1/Conv2D/ReadVariableOp:0
[12/09/2020-14:12:05] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/test/test_conv1/Conv2D [Conv] inputs: [input:0 -> (1, 512, 256, 117)], [StatefulPartitionedCall/test/test_conv1/Conv2D/ReadVariableOp:0 -> (512, 512, 3, 3)], 
[12/09/2020-14:12:05] [V] [TRT] builtin_op_importers.cpp:450: Convolution input dimensions: (1, 512, 256, 117)
[12/09/2020-14:12:05] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/test/test_conv1/Conv2D for ONNX node: StatefulPartitionedCall/test/test_conv1/Conv2D
[12/09/2020-14:12:05] [V] [TRT] builtin_op_importers.cpp:533: Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512
[12/09/2020-14:12:05] [V] [TRT] builtin_op_importers.cpp:534: Convolution output dimensions: (1, 512, 256, 117)
[12/09/2020-14:12:05] [V] [TRT] ImporterContext.hpp:116: Registering tensor: Identity:0_1 for ONNX tensor: Identity:0
[12/09/2020-14:12:05] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/test/test_conv1/Conv2D [Conv] outputs: [Identity:0 -> (1, 512, 256, 117)], 
[12/09/2020-14:12:05] [V] [TRT] ModelImporter.cpp:507: Marking Identity:0_1 as output: Identity:0
 ----- Parsing of ONNX model /home/jetson/temp/dla_1.onnx is Done ---- 
[12/09/2020-14:12:05] [V] [TRT] Applying generic optimizations to the graph for inference.
[12/09/2020-14:12:05] [V] [TRT] Original: 1 layers
[12/09/2020-14:12:05] [V] [TRT] After dead-layer removal: 1 layers
[12/09/2020-14:12:09] [V] [TRT] After DLA optimization: 3 layers
[12/09/2020-14:12:09] [V] [TRT] After Myelin optimization: 3 layers
[12/09/2020-14:12:09] [V] [TRT] After scale fusion: 3 layers
[12/09/2020-14:12:09] [V] [TRT] After vertical fusions: 3 layers
[12/09/2020-14:12:09] [V] [TRT] After final dead-layer removal: 3 layers
[12/09/2020-14:12:09] [V] [TRT] After tensor merging: 3 layers
[12/09/2020-14:12:09] [V] [TRT] After concat removal: 3 layers
[12/09/2020-14:12:09] [V] [TRT] Graph construction and optimization completed in 4.33183 seconds.
[12/09/2020-14:12:09] [I] [TRT] 
[12/09/2020-14:12:09] [I] [TRT] --------------- Layers running on DLA: 
[12/09/2020-14:12:09] [I] [TRT] {StatefulPartitionedCall/test/test_conv1/Conv2D}, 
[12/09/2020-14:12:09] [I] [TRT] --------------- Layers running on GPU: 
[12/09/2020-14:12:09] [I] [TRT] 
[12/09/2020-14:12:12] [V] [TRT] Constructing optimization profile number 0 [1/1].
[12/09/2020-14:12:13] [V] [TRT] --------------- Timing Runner: input:0 to nvm (Reformat)
[12/09/2020-14:12:14] [V] [TRT] Tactic: 1002 time 35.5116
[12/09/2020-14:12:15] [V] [TRT] Tactic: 0 time 59.9356
[12/09/2020-14:12:15] [V] [TRT] Fastest Tactic: 1002 Time: 35.5116
[12/09/2020-14:12:15] [V] [TRT] *************** Autotuning format combination: Half(1,117,29952:16,958464) -> Half(1,117,29952:16,958464) ***************
[12/09/2020-14:12:15] [V] [TRT] --------------- Timing Runner: {StatefulPartitionedCall/test/test_conv1/Conv2D} (DLA)
[12/09/2020-14:12:19] [V] [TRT] Tactic: 548796183939 is the only option, timing skipped
[12/09/2020-14:12:19] [V] [TRT] Fastest Tactic: 548796183939 Time: 0
[12/09/2020-14:12:19] [V] [TRT] --------------- Timing Runner: Identity:0 from nvm (Reformat)
[12/09/2020-14:12:20] [V] [TRT] Tactic: 1002 time 9.00915
[12/09/2020-14:12:20] [V] [TRT] Tactic: 0 time 11.1001
[12/09/2020-14:12:20] [V] [TRT] Fastest Tactic: 1002 Time: 9.00915
[12/09/2020-14:12:20] [V] [TRT] Formats and tactics selection completed in 7.58085 seconds.
[12/09/2020-14:12:20] [V] [TRT] After reformat layers: 5 layers
[12/09/2020-14:12:20] [V] [TRT] Block size 4294967296
[12/09/2020-14:12:20] [V] [TRT] Block size 30670848
[12/09/2020-14:12:20] [V] [TRT] Total Activation Memory: 4325638144
[12/09/2020-14:12:20] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/09/2020-14:12:25] [V] [TRT] Layer: input:0 to nvm Weights: 0 HostPersistent: 0 DevicePersistent: 0
[12/09/2020-14:12:25] [V] [TRT] Layer: {StatefulPartitionedCall/test/test_conv1/Conv2D} Weights: 0 HostPersistent: 864 DevicePersistent: 0
[12/09/2020-14:12:25] [V] [TRT] Layer: input:0 copy finish Weights: 0 HostPersistent: 0 DevicePersistent: 0
[12/09/2020-14:12:25] [V] [TRT] Layer: Identity:0 from nvm Weights: 0 HostPersistent: 0 DevicePersistent: 0
[12/09/2020-14:12:25] [V] [TRT] Layer: Identity:0 copy finish Weights: 0 HostPersistent: 0 DevicePersistent: 0
[12/09/2020-14:12:25] [V] [TRT] Total Host Persistent Memory: 864
[12/09/2020-14:12:25] [V] [TRT] Total Device Persistent Memory: 0
[12/09/2020-14:12:25] [V] [TRT] Total Weight Memory: 0
[12/09/2020-14:12:25] [V] [TRT] Builder timing cache: created 0 entries, 0 hit(s)
[12/09/2020-14:12:25] [V] [TRT] Engine generation completed in 15.8601 seconds.
[12/09/2020-14:12:25] [V] [TRT] Engine Layer Information:
[12/09/2020-14:12:25] [V] [TRT] Layer(Reformat): input:0 to nvm, Tactic: 1002, input:0[Float(512,256,117)] -> input:0 copy[Half(512,256,117)]
[12/09/2020-14:12:25] [V] [TRT] Layer(DLANative): {StatefulPartitionedCall/test/test_conv1/Conv2D}, Tactic: 548796183939, input:0 copy[Half(512,256,117)] -> Identity:0 copy[Half(512,256,117)]
[12/09/2020-14:12:25] [V] [TRT] Layer(FinishNvmRegion): input:0 copy finish, Tactic: 0, input:0 copy[Half(512,256,117)] -> 
[12/09/2020-14:12:25] [V] [TRT] Layer(Reformat): Identity:0 from nvm, Tactic: 1002, Identity:0 copy[Half(512,256,117)] -> Identity:0[Float(512,256,117)]
[12/09/2020-14:12:25] [V] [TRT] Layer(FinishNvmRegion): Identity:0 copy finish, Tactic: 0, Identity:0 copy[Half(512,256,117)] -> 
[12/09/2020-14:12:25] [I] Starting inference threads
[12/09/2020-14:12:29] [I] Warmup completed 0 queries over 200 ms
[12/09/2020-14:12:29] [I] Timing trace has 0 queries over 3.25517 s
[12/09/2020-14:12:29] [I] Trace averages of 10 runs:
[12/09/2020-14:12:29] [I] Average on 10 runs - GPU latency: 124.659 ms - Host latency: 129.987 ms (end to end 130.215 ms, enqueue 1.90764 ms)
[12/09/2020-14:12:29] [I] Average on 10 runs - GPU latency: 124.642 ms - Host latency: 129.985 ms (end to end 130.214 ms, enqueue 1.8229 ms)
[12/09/2020-14:12:29] [I] Host Latency
[12/09/2020-14:12:29] [I] min: 129.644 ms (end to end 129.881 ms)
[12/09/2020-14:12:29] [I] max: 130.315 ms (end to end 130.544 ms)
[12/09/2020-14:12:29] [I] mean: 129.977 ms (end to end 130.206 ms)
[12/09/2020-14:12:29] [I] median: 129.936 ms (end to end 130.16 ms)
[12/09/2020-14:12:29] [I] percentile: 130.315 ms at 99% (end to end 130.544 ms at 99%)
[12/09/2020-14:12:29] [I] throughput: 0 qps
[12/09/2020-14:12:29] [I] walltime: 3.25517 s
[12/09/2020-14:12:29] [I] Enqueue Time
[12/09/2020-14:12:29] [I] min: 1.65674 ms
[12/09/2020-14:12:29] [I] max: 2.20718 ms
[12/09/2020-14:12:29] [I] median: 1.82263 ms
[12/09/2020-14:12:29] [I] GPU Compute
[12/09/2020-14:12:29] [I] min: 124.405 ms
[12/09/2020-14:12:29] [I] max: 124.997 ms
[12/09/2020-14:12:29] [I] mean: 124.644 ms
[12/09/2020-14:12:29] [I] median: 124.595 ms
[12/09/2020-14:12:29] [I] percentile: 124.997 ms at 99%
[12/09/2020-14:12:29] [I] total compute time: 3.1161 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/temp/dla_1.onnx --workspace=4096 --explicitBatch --fp16 --useDLACore=0 --allowGPUFallback --verbose

However, if you set the value of W to +1 and the Input Shape to C512, H256, W118, it will output an Internal DLA error and run on GPU FallBack.

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/temp/dla_2.onnx --workspace=4096 --explicitBatch --fp16 --useDLACore=0 --allowGPUFallback --verbose
[12/09/2020-14:14:00] [I] === Model Options ===
[12/09/2020-14:14:00] [I] Format: ONNX
[12/09/2020-14:14:00] [I] Model: /home/jetson/temp/dla_2.onnx
[12/09/2020-14:14:00] [I] Output:
[12/09/2020-14:14:00] [I] === Build Options ===
[12/09/2020-14:14:00] [I] Max batch: explicit
[12/09/2020-14:14:00] [I] Workspace: 4096 MB
[12/09/2020-14:14:00] [I] minTiming: 1
[12/09/2020-14:14:00] [I] avgTiming: 8
[12/09/2020-14:14:00] [I] Precision: FP32+FP16
[12/09/2020-14:14:00] [I] Calibration: 
[12/09/2020-14:14:00] [I] Safe mode: Disabled
[12/09/2020-14:14:00] [I] Save engine: 
[12/09/2020-14:14:00] [I] Load engine: 
[12/09/2020-14:14:00] [I] Builder Cache: Enabled
[12/09/2020-14:14:00] [I] NVTX verbosity: 0
[12/09/2020-14:14:00] [I] Inputs format: fp32:CHW
[12/09/2020-14:14:00] [I] Outputs format: fp32:CHW
[12/09/2020-14:14:00] [I] Input build shapes: model
[12/09/2020-14:14:00] [I] Input calibration shapes: model
[12/09/2020-14:14:00] [I] === System Options ===
[12/09/2020-14:14:00] [I] Device: 0
[12/09/2020-14:14:00] [I] DLACore: 0(With GPU fallback)
[12/09/2020-14:14:00] [I] Plugins:
[12/09/2020-14:14:00] [I] === Inference Options ===
[12/09/2020-14:14:00] [I] Batch: Explicit
[12/09/2020-14:14:00] [I] Input inference shapes: model
[12/09/2020-14:14:00] [I] Iterations: 10
[12/09/2020-14:14:00] [I] Duration: 3s (+ 200ms warm up)
[12/09/2020-14:14:00] [I] Sleep time: 0ms
[12/09/2020-14:14:00] [I] Streams: 1
[12/09/2020-14:14:00] [I] ExposeDMA: Disabled
[12/09/2020-14:14:00] [I] Spin-wait: Disabled
[12/09/2020-14:14:00] [I] Multithreading: Disabled
[12/09/2020-14:14:00] [I] CUDA Graph: Disabled
[12/09/2020-14:14:00] [I] Skip inference: Disabled
[12/09/2020-14:14:00] [I] Inputs:
[12/09/2020-14:14:00] [I] === Reporting Options ===
[12/09/2020-14:14:00] [I] Verbose: Enabled
[12/09/2020-14:14:00] [I] Averages: 10 inferences
[12/09/2020-14:14:00] [I] Percentile: 99
[12/09/2020-14:14:00] [I] Dump output: Disabled
[12/09/2020-14:14:00] [I] Profile: Disabled
[12/09/2020-14:14:00] [I] Export timing to JSON file: 
[12/09/2020-14:14:00] [I] Export output to JSON file: 
[12/09/2020-14:14:00] [I] Export profile to JSON file: 
[12/09/2020-14:14:00] [I] 
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::Proposal version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::Split version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[12/09/2020-14:14:00] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
----------------------------------------------------------------
Input filename:   /home/jetson/temp/dla_2.onnx
ONNX IR version:  0.0.7
Opset version:    12
Producer name:    tf2onnx
Producer version: 1.7.0
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::Clip_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::Proposal version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::Split version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1
[12/09/2020-14:14:02] [V] [TRT] ModelImporter.cpp:202: Adding network input: input:0 with dtype: float32, dimensions: (1, 512, 256, 118)
[12/09/2020-14:14:02] [V] [TRT] ImporterContext.hpp:116: Registering tensor: input:0 for ONNX tensor: input:0
[12/09/2020-14:14:02] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/test/test_conv1/Conv2D/ReadVariableOp:0
[12/09/2020-14:14:02] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/test/test_conv1/Conv2D [Conv]
[12/09/2020-14:14:02] [V] [TRT] ModelImporter.cpp:119: Searching for input: input:0
[12/09/2020-14:14:02] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/test/test_conv1/Conv2D/ReadVariableOp:0
[12/09/2020-14:14:02] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/test/test_conv1/Conv2D [Conv] inputs: [input:0 -> (1, 512, 256, 118)], [StatefulPartitionedCall/test/test_conv1/Conv2D/ReadVariableOp:0 -> (512, 512, 3, 3)], 
[12/09/2020-14:14:02] [V] [TRT] builtin_op_importers.cpp:450: Convolution input dimensions: (1, 512, 256, 118)
[12/09/2020-14:14:02] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/test/test_conv1/Conv2D for ONNX node: StatefulPartitionedCall/test/test_conv1/Conv2D
[12/09/2020-14:14:02] [V] [TRT] builtin_op_importers.cpp:533: Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512
[12/09/2020-14:14:02] [V] [TRT] builtin_op_importers.cpp:534: Convolution output dimensions: (1, 512, 256, 118)
[12/09/2020-14:14:02] [V] [TRT] ImporterContext.hpp:116: Registering tensor: Identity:0_1 for ONNX tensor: Identity:0
[12/09/2020-14:14:02] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/test/test_conv1/Conv2D [Conv] outputs: [Identity:0 -> (1, 512, 256, 118)], 
[12/09/2020-14:14:02] [V] [TRT] ModelImporter.cpp:507: Marking Identity:0_1 as output: Identity:0
 ----- Parsing of ONNX model /home/jetson/temp/dla_2.onnx is Done ---- 
[12/09/2020-14:14:02] [V] [TRT] Applying generic optimizations to the graph for inference.
[12/09/2020-14:14:02] [V] [TRT] Original: 1 layers
[12/09/2020-14:14:02] [V] [TRT] After dead-layer removal: 1 layers
[12/09/2020-14:14:02] [W] [TRT] Internal DLA error for layer StatefulPartitionedCall/test/test_conv1/Conv2D. Switching to GPU fallback.
[12/09/2020-14:14:02] [V] [TRT] After DLA optimization: 1 layers
[12/09/2020-14:14:02] [V] [TRT] After Myelin optimization: 1 layers
[12/09/2020-14:14:02] [V] [TRT] After scale fusion: 1 layers
[12/09/2020-14:14:02] [V] [TRT] After vertical fusions: 1 layers
[12/09/2020-14:14:02] [V] [TRT] After final dead-layer removal: 1 layers
[12/09/2020-14:14:02] [V] [TRT] After tensor merging: 1 layers
[12/09/2020-14:14:02] [V] [TRT] After concat removal: 1 layers
[12/09/2020-14:14:02] [V] [TRT] Graph construction and optimization completed in 0.0234387 seconds.
[12/09/2020-14:14:02] [I] [TRT] 
[12/09/2020-14:14:02] [I] [TRT] --------------- Layers running on DLA: 
[12/09/2020-14:14:02] [I] [TRT] 
[12/09/2020-14:14:02] [I] [TRT] --------------- Layers running on GPU: 
[12/09/2020-14:14:02] [I] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D, 
[12/09/2020-14:14:06] [V] [TRT] Constructing optimization profile number 0 [1/1].
[12/09/2020-14:14:06] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[12/09/2020-14:14:06] [V] [TRT] Tactic: 1002 time 16.6674
[12/09/2020-14:14:07] [V] [TRT] Tactic: 0 time 30.177
[12/09/2020-14:14:07] [V] [TRT] Fastest Tactic: 1002 Time: 16.6674
[12/09/2020-14:14:07] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[12/09/2020-14:14:08] [V] [TRT] Tactic: 1002 time 8.27307
[12/09/2020-14:14:08] [V] [TRT] Tactic: 0 time 2.0723
[12/09/2020-14:14:08] [V] [TRT] Fastest Tactic: 0 Time: 2.0723
[12/09/2020-14:14:08] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[12/09/2020-14:14:08] [V] [TRT] Tactic: 1002 time 4.0247
[12/09/2020-14:14:08] [V] [TRT] Tactic: 0 time 1.99531
[12/09/2020-14:14:08] [V] [TRT] Fastest Tactic: 0 Time: 1.99531
[12/09/2020-14:14:08] [V] [TRT] *************** Autotuning format combination: Float(1,118,30208,15466496) -> Float(1,118,30208,15466496) ***************
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_medium_nn_v1
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn_winograd) Set Tactic Name: volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_xregs_large_nn_v1
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_small_nn_v1
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_xregs_large_nn_v1
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_small_nn_v1
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_medium_nn_v1
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_medium_nn_v1
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_small_nn_v1
[12/09/2020-14:14:08] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (FusedConvActConvolution)
[12/09/2020-14:14:08] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping
[12/09/2020-14:14:08] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CaskConvolution)
[12/09/2020-14:14:08] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_medium_nn_v1
[12/09/2020-14:14:11] [V] [TRT] Tactic: 1825138533642645384 time 170.358
[12/09/2020-14:14:11] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn_winograd) Set Tactic Name: volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:14:13] [V] [TRT] Tactic: 2775507031594384867 time 122.819
[12/09/2020-14:14:13] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_xregs_large_nn_v1
[12/09/2020-14:14:16] [V] [TRT] Tactic: 2842488832350522458 time 171.109
[12/09/2020-14:14:16] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_small_nn_v1
[12/09/2020-14:14:18] [V] [TRT] Tactic: 3915320020053085238 time 170.018
[12/09/2020-14:14:18] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_xregs_large_nn_v1
[12/09/2020-14:14:21] [V] [TRT] Tactic: 6448355332020552203 time 173.27
[12/09/2020-14:14:21] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_small_nn_v1
[12/09/2020-14:14:24] [V] [TRT] Tactic: 6808617066150061604 time 174.264
[12/09/2020-14:14:24] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_medium_nn_v1
[12/09/2020-14:14:27] [V] [TRT] Tactic: -8060443123034038864 time 175.836
[12/09/2020-14:14:27] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_medium_nn_v1
[12/09/2020-14:14:30] [V] [TRT] Tactic: -4420849921117327522 time 194.346
[12/09/2020-14:14:30] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_small_nn_v1
[12/09/2020-14:14:33] [V] [TRT] Tactic: -3946921629105938337 time 187.262
[12/09/2020-14:14:33] [V] [TRT] Fastest Tactic: 2775507031594384867 Time: 122.819
[12/09/2020-14:14:33] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaConvolution)
[12/09/2020-14:14:37] [V] [TRT] Tactic: 0 time 216.383
[12/09/2020-14:14:41] [V] [TRT] Tactic: 2 time 203.681
[12/09/2020-14:14:55] [V] [TRT] Tactic: 5 time 885.983
[12/09/2020-14:14:57] [V] [TRT] Tactic: 6 time 130.463
[12/09/2020-14:15:00] [V] [TRT] Tactic: 57 time 171.723
[12/09/2020-14:15:00] [V] [TRT] Fastest Tactic: 6 Time: 130.463
[12/09/2020-14:15:00] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaDepthwiseConvolution)
[12/09/2020-14:15:00] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:00] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: 2775507031594384867
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn_winograd) Set Tactic Name: volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:15:00] [V] [TRT] 
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_medium_nn_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn_winograd) Set Tactic Name: volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_xregs_large_nn_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_small_nn_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_xregs_large_nn_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_small_nn_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_medium_nn_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_medium_nn_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_small_nn_v1
[12/09/2020-14:15:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (scudnn_winograd) Set Tactic Name: volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:15:00] [V] [TRT] *************** Autotuning format combination: Half(1,118,30208,15466496) -> Half(1,118,30208,15466496) ***************
[12/09/2020-14:15:00] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (FusedConvActConvolution)
[12/09/2020-14:15:00] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:00] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CaskConvolution)
[12/09/2020-14:15:00] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:00] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaConvolution)
[12/09/2020-14:15:03] [V] [TRT] Tactic: 0 time 218.342
[12/09/2020-14:15:06] [V] [TRT] Tactic: 1 time 172.931
[12/09/2020-14:15:09] [V] [TRT] Tactic: 2 time 200.922
[12/09/2020-14:15:23] [V] [TRT] Tactic: 5 time 884.671
[12/09/2020-14:15:26] [V] [TRT] Tactic: 6 time 127.784
[12/09/2020-14:15:26] [V] [TRT] Fastest Tactic: 6 Time: 127.784
[12/09/2020-14:15:26] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaDepthwiseConvolution)
[12/09/2020-14:15:26] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:26] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CudaConvolution Tactic: 6
[12/09/2020-14:15:26] [V] [TRT] 
[12/09/2020-14:15:26] [V] [TRT] *************** Autotuning format combination: Half(1,118,30208:2,7733248) -> Half(1,118,30208:2,7733248) ***************
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_large_nn_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_medium_nn_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_medium_nn_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_small_nn_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_small_nn_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_large_nn_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_medium_nn_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn_winograd) Set Tactic Name: volta_fp16x2_hcudnn_winograd_fp16x2_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_small_nn_v1
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_large_nn_v1
[12/09/2020-14:15:26] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (FusedConvActConvolution)
[12/09/2020-14:15:26] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:26] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CaskConvolution)
[12/09/2020-14:15:26] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_large_nn_v1
[12/09/2020-14:15:27] [V] [TRT] Tactic: 1145226902788474763 time 88.6559
[12/09/2020-14:15:27] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_medium_nn_v1
[12/09/2020-14:15:29] [V] [TRT] Tactic: 2418518597804310654 time 89.1632
[12/09/2020-14:15:29] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_medium_nn_v1
[12/09/2020-14:15:30] [V] [TRT] Tactic: 8292881859266835088 time 95.952
[12/09/2020-14:15:30] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_small_nn_v1
[12/09/2020-14:15:31] [V] [TRT] Tactic: 8401509141903434922 time 88.0109
[12/09/2020-14:15:31] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_small_nn_v1
[12/09/2020-14:15:33] [V] [TRT] Tactic: -8654297089785671176 time 87.5183
[12/09/2020-14:15:33] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_large_nn_v1
[12/09/2020-14:15:35] [V] [TRT] Tactic: -7448936905981214224 time 100.347
[12/09/2020-14:15:35] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_medium_nn_v1
[12/09/2020-14:15:36] [V] [TRT] Tactic: -3689982367035295496 time 87.3648
[12/09/2020-14:15:36] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn_winograd) Set Tactic Name: volta_fp16x2_hcudnn_winograd_fp16x2_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:15:37] [V] [TRT] Tactic: -3140347171730126532 time 61.2649
[12/09/2020-14:15:37] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_small_nn_v1
[12/09/2020-14:15:39] [V] [TRT] Tactic: -2027588946874785071 time 93.5188
[12/09/2020-14:15:39] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_large_nn_v1
[12/09/2020-14:15:40] [V] [TRT] Tactic: -245090590808296743 time 89.048
[12/09/2020-14:15:40] [V] [TRT] Fastest Tactic: -3140347171730126532 Time: 61.2649
[12/09/2020-14:15:40] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaConvolution)
[12/09/2020-14:15:40] [V] [TRT] CudaConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:40] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaDepthwiseConvolution)
[12/09/2020-14:15:40] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:40] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: -3140347171730126532
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn_winograd) Set Tactic Name: volta_fp16x2_hcudnn_winograd_fp16x2_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:15:40] [V] [TRT] 
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_large_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_medium_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_medium_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x64_relu_small_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_small_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_large_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_medium_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn_winograd) Set Tactic Name: volta_fp16x2_hcudnn_winograd_fp16x2_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x32_relu_small_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn) Set Tactic Name: volta_fp16x2_hcudnn_fp16x2_128x128_relu_large_nn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (hcudnn_winograd) Set Tactic Name: volta_fp16x2_hcudnn_winograd_fp16x2_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[12/09/2020-14:15:40] [V] [TRT] *************** Autotuning format combination: Half(64,7552,1:8,1933312) -> Float(1,118,30208,15466496) ***************
[12/09/2020-14:15:40] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (FusedConvActConvolution)
[12/09/2020-14:15:40] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:40] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CaskConvolution)
[12/09/2020-14:15:40] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:40] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaConvolution)
[12/09/2020-14:15:40] [V] [TRT] CudaConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:40] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaDepthwiseConvolution)
[12/09/2020-14:15:40] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:40] [V] [TRT] *************** Autotuning format combination: Half(64,7552,1:8,1933312) -> Half(64,7552,1:8,1933312) ***************
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_128x128_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_sliced1x2_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_128x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:40] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (FusedConvActConvolution)
[12/09/2020-14:15:40] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:40] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CaskConvolution)
[12/09/2020-14:15:40] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_128x128_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:41] [V] [TRT] Tactic: 3754069740140581927 time 28.3198
[12/09/2020-14:15:41] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:41] [V] [TRT] Tactic: 5925270497649423688 time 24.9993
[12/09/2020-14:15:41] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_sliced1x2_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:42] [V] [TRT] Tactic: 6680916730816870145 time 28.2249
[12/09/2020-14:15:42] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:42] [V] [TRT] Tactic: 7158029511300006471 time 27.5882
[12/09/2020-14:15:42] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:42] [V] [TRT] Tactic: 7859952145590271433 time 28.3247
[12/09/2020-14:15:42] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_128x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:43] [V] [TRT] Tactic: 8283847742354150423 time 27.6193
[12/09/2020-14:15:43] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:43] [V] [TRT] Tactic: -4534876761957424274 time 26.2655
[12/09/2020-14:15:43] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:44] [V] [TRT] Tactic: -3237051169894153788 time 27.2856
[12/09/2020-14:15:44] [V] [TRT] Fastest Tactic: 5925270497649423688 Time: 24.9993
[12/09/2020-14:15:44] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaConvolution)
[12/09/2020-14:15:49] [V] [TRT] Tactic: 0 time 301.828
[12/09/2020-14:15:51] [V] [TRT] Tactic: 1 time 175.92
[12/09/2020-14:15:56] [V] [TRT] Tactic: 2 time 295.189
[12/09/2020-14:15:58] [V] [TRT] Tactic: 6 time 126.846
[12/09/2020-14:15:58] [V] [TRT] Fastest Tactic: 6 Time: 126.846
[12/09/2020-14:15:58] [V] [TRT] --------------- Timing Runner: StatefulPartitionedCall/test/test_conv1/Conv2D (CudaDepthwiseConvolution)
[12/09/2020-14:15:58] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/09/2020-14:15:58] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: 5925270497649423688
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] 
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_128x128_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_sliced1x2_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_128x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x64_ldg8_relu_exp_small_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:15:58] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[12/09/2020-14:15:58] [V] [TRT] Tactic: 1002 time 3.89012
[12/09/2020-14:15:58] [V] [TRT] Tactic: 0 time 3.06572
[12/09/2020-14:15:58] [V] [TRT] Fastest Tactic: 0 Time: 3.06572
[12/09/2020-14:15:59] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[12/09/2020-14:15:59] [V] [TRT] Tactic: 1002 time 7.36298
[12/09/2020-14:15:59] [V] [TRT] Tactic: 0 time 1.96536
[12/09/2020-14:15:59] [V] [TRT] Fastest Tactic: 0 Time: 1.96536
[12/09/2020-14:15:59] [V] [TRT] --------------- Timing Runner: <reformat> (Reformat)
[12/09/2020-14:15:59] [V] [TRT] Tactic: 1002 time 4.16556
[12/09/2020-14:15:59] [V] [TRT] Tactic: 0 time 2.03621
[12/09/2020-14:15:59] [V] [TRT] Fastest Tactic: 0 Time: 2.03621
[12/09/2020-14:15:59] [V] [TRT] Reformatting format: [in] Float(1,118,30208,15466496), [out] Half(64,7552,1:8,1933312)
[12/09/2020-14:15:59] [V] [TRT] Reformatting format: [in] Half(64,7552,1:8,1933312), [out] Float(1,118,30208,15466496)
[12/09/2020-14:15:59] [W] [TRT] No implementation obeys reformatting-free rules, at least 2 reformatting nodes are needed, now picking the fastest path instead.
[12/09/2020-14:16:00] [V] [TRT] Adding reformat layer: StatefulPartitionedCall/test/test_conv1/Conv2D reformatted input 0 (input:0) from Float(1,118,30208,15466496) to Half(64,7552,1:8,1933312)
[12/09/2020-14:16:00] [V] [TRT] Adding reformat layer: StatefulPartitionedCall/test/test_conv1/Conv2D output to be reformatted 0 (Identity:0) from Float(1,118,30208,15466496) to Half(64,7552,1:8,1933312)
[12/09/2020-14:16:00] [V] [TRT] Formats and tactics selection completed in 114.058 seconds.
[12/09/2020-14:16:00] [V] [TRT] After reformat layers: 3 layers
[12/09/2020-14:16:00] [V] [TRT] Block size 4294967296
[12/09/2020-14:16:00] [V] [TRT] Block size 30932992
[12/09/2020-14:16:00] [V] [TRT] Block size 30932992
[12/09/2020-14:16:00] [V] [TRT] Total Activation Memory: 4356833280
[12/09/2020-14:16:00] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/09/2020-14:16:00] [V] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D (h884cudnn) Set Tactic Name: volta_h884cudnn_256x128_ldg8_relu_exp_medium_nhwc_tn_v1
[12/09/2020-14:16:00] [V] [TRT] Layer: StatefulPartitionedCall/test/test_conv1/Conv2D input reformatter 0 Weights: 0 HostPersistent: 0 DevicePersistent: 0
[12/09/2020-14:16:00] [V] [TRT] Layer: StatefulPartitionedCall/test/test_conv1/Conv2D Weights: 0 HostPersistent: 2176 DevicePersistent: 4900352
[12/09/2020-14:16:00] [V] [TRT] Layer: StatefulPartitionedCall/test/test_conv1/Conv2D output reformatter 0 Weights: 0 HostPersistent: 0 DevicePersistent: 0
[12/09/2020-14:16:00] [V] [TRT] Total Host Persistent Memory: 2176
[12/09/2020-14:16:00] [V] [TRT] Total Device Persistent Memory: 4900352
[12/09/2020-14:16:00] [V] [TRT] Total Weight Memory: 0
[12/09/2020-14:16:00] [V] [TRT] Builder timing cache: created 10 entries, 6 hit(s)
[12/09/2020-14:16:00] [V] [TRT] Engine generation completed in 117.632 seconds.
[12/09/2020-14:16:00] [V] [TRT] Engine Layer Information:
[12/09/2020-14:16:00] [V] [TRT] Layer(Reformat): StatefulPartitionedCall/test/test_conv1/Conv2D input reformatter 0, Tactic: 0, input:0[Float(512,256,118)] -> StatefulPartitionedCall/test/test_conv1/Conv2D reformatted input 0[Half(512,256,118)]
[12/09/2020-14:16:00] [V] [TRT] Layer(h884cudnn): StatefulPartitionedCall/test/test_conv1/Conv2D, Tactic: 5925270497649423688, StatefulPartitionedCall/test/test_conv1/Conv2D reformatted input 0[Half(512,256,118)] -> StatefulPartitionedCall/test/test_conv1/Conv2D output to be reformatted 0[Half(512,256,118)]
[12/09/2020-14:16:00] [V] [TRT] Layer(Reformat): StatefulPartitionedCall/test/test_conv1/Conv2D output reformatter 0, Tactic: 0, StatefulPartitionedCall/test/test_conv1/Conv2D output to be reformatted 0[Half(512,256,118)] -> Identity:0[Float(512,256,118)]
[12/09/2020-14:16:00] [I] Starting inference threads
[12/09/2020-14:16:03] [I] Warmup completed 0 queries over 200 ms
[12/09/2020-14:16:03] [I] Timing trace has 0 queries over 3.10865 s
[12/09/2020-14:16:03] [I] Trace averages of 10 runs:
[12/09/2020-14:16:03] [I] Average on 10 runs - GPU latency: 36.2495 ms - Host latency: 41.7726 ms (end to end 41.9971 ms, enqueue 0.139229 ms)
[12/09/2020-14:16:03] [I] Average on 10 runs - GPU latency: 36.3167 ms - Host latency: 41.8462 ms (end to end 42.0746 ms, enqueue 0.136829 ms)
[12/09/2020-14:16:03] [I] Average on 10 runs - GPU latency: 36.2492 ms - Host latency: 41.7744 ms (end to end 42.0047 ms, enqueue 0.138843 ms)
[12/09/2020-14:16:03] [I] Average on 10 runs - GPU latency: 36.2422 ms - Host latency: 41.7648 ms (end to end 41.9867 ms, enqueue 0.150269 ms)
[12/09/2020-14:16:03] [I] Average on 10 runs - GPU latency: 36.2346 ms - Host latency: 41.7605 ms (end to end 41.9888 ms, enqueue 0.129846 ms)
[12/09/2020-14:16:03] [I] Average on 10 runs - GPU latency: 36.2362 ms - Host latency: 41.7603 ms (end to end 41.9888 ms, enqueue 0.135718 ms)
[12/09/2020-14:16:03] [I] Average on 10 runs - GPU latency: 36.2705 ms - Host latency: 41.7985 ms (end to end 42.0222 ms, enqueue 0.126172 ms)
[12/09/2020-14:16:03] [I] Host Latency
[12/09/2020-14:16:03] [I] min: 41.5083 ms (end to end 41.7278 ms)
[12/09/2020-14:16:03] [I] max: 42.1628 ms (end to end 42.3883 ms)
[12/09/2020-14:16:03] [I] mean: 41.7814 ms (end to end 42.0079 ms)
[12/09/2020-14:16:03] [I] median: 41.8791 ms (end to end 42.1104 ms)
[12/09/2020-14:16:03] [I] percentile: 42.1628 ms at 99% (end to end 42.3883 ms at 99%)
[12/09/2020-14:16:03] [I] throughput: 0 qps
[12/09/2020-14:16:03] [I] walltime: 3.10865 s
[12/09/2020-14:16:03] [I] Enqueue Time
[12/09/2020-14:16:03] [I] min: 0.115234 ms
[12/09/2020-14:16:03] [I] max: 0.240723 ms
[12/09/2020-14:16:03] [I] median: 0.127777 ms
[12/09/2020-14:16:03] [I] GPU Compute
[12/09/2020-14:16:03] [I] min: 35.9958 ms
[12/09/2020-14:16:03] [I] max: 36.608 ms
[12/09/2020-14:16:03] [I] mean: 36.2592 ms
[12/09/2020-14:16:03] [I] median: 36.3625 ms
[12/09/2020-14:16:03] [I] percentile: 36.608 ms at 99%
[12/09/2020-14:16:03] [I] total compute time: 2.68318 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/temp/dla_2.onnx --workspace=4096 --explicitBatch --fp16 --useDLACore=0 --allowGPUFallback --verbose

Looking at the page at Developer Guide :: NVIDIA Deep Learning TensorRT Documentation, it is not clear under what conditions this happens.

What is the cause of this Internal DLA error?

Thank you in advance.

Regards,

AastaLLL · December 9, 2020, 7:23am

Hi,

Could you share the onnx model with us (both W117 and W118) so we can check it further?

Thanks.

HelloNewJAPAN · December 9, 2020, 7:59am

Hi,
I really appreciate your reply.

I understand.
I’ll share the ONNX model for testing.

Share in the cloud because of the large size of the files.

Best regards,

AastaLLL · December 18, 2020, 6:04am

HI,

Thanks for sharing the model with us.
We can also reproduce this issue in our environment and checking the cause with our internal team.
Will share more information with you once we got any feedback.

Thanks.

HelloNewJAPAN · December 21, 2020, 1:25am

Hi,

Thank you for your continuous support.
Let me know if you have anything you want to ask.

AastaLLL · December 29, 2020, 7:01am

Thanks for your reply.

This question is passed to our internal team.
But due to the holiday season, it may take longer to get the response.
We will share information with you once we get an answer.

Thank you and Happy Holidays. ʕ•́ᴥ•̀ʔ

HelloNewJAPAN · December 29, 2020, 8:19am

Hi,
Thank you for always being nice to me.

Wishing you joy, peace and good health this Holiday Season.

AastaLLL · January 15, 2021, 3:12am

Hi,

Thanks for your patience.
We are still checking this issue with our internal team.
Will give you more information later.

Thanks.

AastaLLL · January 28, 2021, 5:01am

Hi,

Thanks for your patience and sorry for the late update.

The 118’s model is rejected since it exceeds the DLA CBUF limitation.
CBUF is the convolution buffer of DLA.

Since DLA is a hardware-based inference engine, the buffer is limited in size.
Please check the following page for the DLA hardware design.
http://nvdla.org/hw/v1/hwarch.html

We also update the parser to give a more detailed error in our next TensorRT release.

Thanks.

HelloNewJAPAN · January 28, 2021, 11:49pm

Hi,
I understand.

I want to express my gratitude for always providing me with detailed information.

Best regards,

Topic		Replies	Views
Cannot build a TensorRT engine for DLA from a large ONNX file Jetson Xavier NX tensorrt , nvbugs , dla	12	2740	July 21, 2021
Trtexec log problem and use DLA error on Jetson Xavier Jetson AGX Xavier dla	7	1642	October 18, 2021
Fail at runing conv layer on DLA Jetson AGX Orin dla	13	1322	November 9, 2022
Cuda Memory Error when enabling the DLA Jetson AGX Xavier	17	2232	August 5, 2019
Regnet with DLA may not work sometimes Jetson AGX Xavier tensorrt , dla	4	783	October 18, 2021
[Xavier NX + DLA] does not support dynamic shapes, and CBUF size requirement Jetson Xavier NX tensorrt , nvbugs , dla	9	1939	October 18, 2021
Uff-parser for DLA bug reports Jetson AGX Xavier	6	882	October 18, 2021
TensorRT run DLA on Xavier Jetson AGX Xavier nvbugs	11	1709	October 18, 2021
Multiple issues running nets on DLA Jetson AGX Xavier	15	1617	October 18, 2021
Convert model to TensorRT with DLA \| DLA Node compilation Failed TensorRT	3	986	October 12, 2021

I get Internal DLA error and it runs on GPU FallBack

Related topics