I have tested it in my side on JetPack 6.0 GA. I can buid the engine from onnx file. As the onnx model is from forum user and we will release Yolov8 in next release soon, suggest to use formal release Yolov8. We do have Yolov5 DLA sample in here: GitHub - NVIDIA-AI-IOT/cuDLA-samples: YOLOv5 on Orin DLA
$ /usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=/mnt/share/yolov5s.engine --fp16 --useDLACore=0 --allowGPUFallback
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=/mnt/share/yolov5s.engine --fp16 --useDLACore=0 --allowGPUFallback
[06/07/2024-09:59:39] [I] === Model Options ===
[06/07/2024-09:59:39] [I] Format: ONNX
[06/07/2024-09:59:39] [I] Model: yolov5s.onnx
[06/07/2024-09:59:39] [I] Output:
[06/07/2024-09:59:39] [I] === Build Options ===
[06/07/2024-09:59:39] [I] Max batch: explicit batch
[06/07/2024-09:59:39] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[06/07/2024-09:59:39] [I] minTiming: 1
[06/07/2024-09:59:39] [I] avgTiming: 8
[06/07/2024-09:59:39] [I] Precision: FP32+FP16
[06/07/2024-09:59:39] [I] LayerPrecisions:
[06/07/2024-09:59:39] [I] Layer Device Types:
[06/07/2024-09:59:39] [I] Calibration:
[06/07/2024-09:59:39] [I] Refit: Disabled
[06/07/2024-09:59:39] [I] Version Compatible: Disabled
[06/07/2024-09:59:39] [I] ONNX Native InstanceNorm: Disabled
[06/07/2024-09:59:39] [I] TensorRT runtime: full
[06/07/2024-09:59:39] [I] Lean DLL Path:
[06/07/2024-09:59:39] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[06/07/2024-09:59:39] [I] Exclude Lean Runtime: Disabled
[06/07/2024-09:59:39] [I] Sparsity: Disabled
[06/07/2024-09:59:39] [I] Safe mode: Disabled
[06/07/2024-09:59:39] [I] Build DLA standalone loadable: Disabled
[06/07/2024-09:59:39] [I] Allow GPU fallback for DLA: Enabled
[06/07/2024-09:59:39] [I] DirectIO mode: Disabled
[06/07/2024-09:59:39] [I] Restricted mode: Disabled
[06/07/2024-09:59:39] [I] Skip inference: Disabled
[06/07/2024-09:59:39] [I] Save engine: /mnt/share/yolov5s.engine
[06/07/2024-09:59:39] [I] Load engine:
[06/07/2024-09:59:39] [I] Profiling verbosity: 0
[06/07/2024-09:59:39] [I] Tactic sources: Using default tactic sources
[06/07/2024-09:59:39] [I] timingCacheMode: local
[06/07/2024-09:59:39] [I] timingCacheFile:
[06/07/2024-09:59:39] [I] Heuristic: Disabled
[06/07/2024-09:59:39] [I] Preview Features: Use default preview flags.
[06/07/2024-09:59:39] [I] MaxAuxStreams: -1
[06/07/2024-09:59:39] [I] BuilderOptimizationLevel: -1
[06/07/2024-09:59:39] [I] Input(s)s format: fp32:CHW
[06/07/2024-09:59:39] [I] Output(s)s format: fp32:CHW
[06/07/2024-09:59:39] [I] Input build shapes: model
[06/07/2024-09:59:39] [I] Input calibration shapes: model
[06/07/2024-09:59:39] [I] === System Options ===
[06/07/2024-09:59:39] [I] Device: 0
[06/07/2024-09:59:39] [I] DLACore: 0
[06/07/2024-09:59:39] [I] Plugins:
[06/07/2024-09:59:39] [I] setPluginsToSerialize:
[06/07/2024-09:59:39] [I] dynamicPlugins:
[06/07/2024-09:59:39] [I] ignoreParsedPluginLibs: 0
[06/07/2024-09:59:39] [I]
[06/07/2024-09:59:39] [I] === Inference Options ===
[06/07/2024-09:59:39] [I] Batch: Explicit
[06/07/2024-09:59:39] [I] Input inference shapes: model
[06/07/2024-09:59:39] [I] Iterations: 10
[06/07/2024-09:59:39] [I] Duration: 3s (+ 200ms warm up)
[06/07/2024-09:59:39] [I] Sleep time: 0ms
[06/07/2024-09:59:39] [I] Idle time: 0ms
[06/07/2024-09:59:39] [I] Inference Streams: 1
[06/07/2024-09:59:39] [I] ExposeDMA: Disabled
[06/07/2024-09:59:39] [I] Data transfers: Enabled
[06/07/2024-09:59:39] [I] Spin-wait: Disabled
[06/07/2024-09:59:39] [I] Multithreading: Disabled
[06/07/2024-09:59:39] [I] CUDA Graph: Disabled
[06/07/2024-09:59:39] [I] Separate profiling: Disabled
[06/07/2024-09:59:39] [I] Time Deserialize: Disabled
[06/07/2024-09:59:39] [I] Time Refit: Disabled
[06/07/2024-09:59:39] [I] NVTX verbosity: 0
[06/07/2024-09:59:39] [I] Persistent Cache Ratio: 0
[06/07/2024-09:59:39] [I] Inputs:
[06/07/2024-09:59:39] [I] === Reporting Options ===
[06/07/2024-09:59:39] [I] Verbose: Disabled
[06/07/2024-09:59:39] [I] Averages: 10 inferences
[06/07/2024-09:59:39] [I] Percentiles: 90,95,99
[06/07/2024-09:59:39] [I] Dump refittable layers:Disabled
[06/07/2024-09:59:39] [I] Dump output: Disabled
[06/07/2024-09:59:39] [I] Profile: Disabled
[06/07/2024-09:59:39] [I] Export timing to JSON file:
[06/07/2024-09:59:39] [I] Export output to JSON file:
[06/07/2024-09:59:39] [I] Export profile to JSON file:
[06/07/2024-09:59:39] [I]
[06/07/2024-09:59:39] [I] === Device Information ===
[06/07/2024-09:59:39] [I] Selected Device: Orin
[06/07/2024-09:59:39] [I] Compute Capability: 8.7
[06/07/2024-09:59:39] [I] SMs: 8
[06/07/2024-09:59:39] [I] Device Global Memory: 30697 MiB
[06/07/2024-09:59:39] [I] Shared Memory per SM: 164 KiB
[06/07/2024-09:59:39] [I] Memory Bus Width: 256 bits (ECC disabled)
[06/07/2024-09:59:39] [I] Application Compute Clock Rate: 1.3 GHz
[06/07/2024-09:59:39] [I] Application Memory Clock Rate: 0.612 GHz
[06/07/2024-09:59:39] [I]
[06/07/2024-09:59:39] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[06/07/2024-09:59:39] [I]
[06/07/2024-09:59:39] [I] TensorRT version: 8.6.2
[06/07/2024-09:59:39] [I] Loading standard plugins
[06/07/2024-09:59:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 7482 (MiB)
[06/07/2024-09:59:45] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1429, now: CPU 1223, GPU 8949 (MiB)
[06/07/2024-09:59:45] [I] Start parsing network model.
[06/07/2024-09:59:45] [I] [TRT] ----------------------------------------------------------------
[06/07/2024-09:59:45] [I] [TRT] Input filename: yolov5s.onnx
[06/07/2024-09:59:45] [I] [TRT] ONNX IR version: 0.0.8
[06/07/2024-09:59:45] [I] [TRT] Opset version: 17
[06/07/2024-09:59:45] [I] [TRT] Producer name: pytorch
[06/07/2024-09:59:45] [I] [TRT] Producer version: 2.2.1
[06/07/2024-09:59:45] [I] [TRT] Domain:
[06/07/2024-09:59:45] [I] [TRT] Model version: 0
[06/07/2024-09:59:45] [I] [TRT] Doc string:
[06/07/2024-09:59:45] [I] [TRT] ----------------------------------------------------------------
[06/07/2024-09:59:45] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/07/2024-09:59:45] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[06/07/2024-09:59:45] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[06/07/2024-09:59:45] [I] Finished parsing network model. Parse time: 0.0705417
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.11/Concat_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.15/Concat_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Transpose’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_14: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_14’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_15: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_15’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Constant_13_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 206) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Sub_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Constant_14_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 211) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 213) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 215) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Pow: DLA cores do not support POW ElementWise operation.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Pow’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Expand_3_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Concat_1: DLA only supports concatenation on the C dimension.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Concat_1’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Transpose_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_1: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_1_16: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_1_16’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_1_17: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_1_17’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 228) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Sub_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Constant_32_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 233) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 235) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 237) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Pow_1: DLA cores do not support POW ElementWise operation.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Pow_1’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Expand_7_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Concat_3: DLA only supports concatenation on the C dimension.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Concat_3’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_3’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_4’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Transpose_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_2: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_2’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_2_18: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_2_18’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_2_19: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_2_19’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 250) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Sub_2_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Constant_50_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 255) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 257) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 259) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Pow_2: DLA cores do not support POW ElementWise operation.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Pow_2’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Expand_11_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Concat_5: DLA only supports concatenation on the C dimension.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Concat_5’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_5’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Concat_6: DLA only supports concatenation on the C dimension.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Concat_6’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /1/Slice: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/Slice’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /1/Slice_1: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/Slice_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /1/Slice_2: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/Slice_2’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/ReduceMax’ (REDUCE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/ArgMax’ (TOPK): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/Cast’ (CAST): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:46] [W] [TRT] /0/model.11/Resize: DLA only supports Resize with pre-set scale factors, hence computing explicit scale factors from output dimensions.
[06/07/2024-09:59:46] [W] [TRT] /0/model.15/Resize: DLA only supports Resize with pre-set scale factors, hence computing explicit scale factors from output dimensions.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,80,80,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_4. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_10. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_16. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,80,80,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_2. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_8. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] Splitting DLA subgraph at: /0/model.24/Mul_8 because DLA validation failed for this layer.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_8. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_14. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] Splitting DLA subgraph at: /0/model.24/Mul_14 because DLA validation failed for this layer.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_14. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] Input tensor has less than 4 dimensions for /1/Mul. At least one shuffle layer will be inserted which cannot run on DLA.
[06/07/2024-09:59:46] [W] [TRT] Dimension: 2 (25200) exceeds maximum allowed size for DLA: 8192
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /1/Mul. Switching to GPU fallback.
[06/07/2024-09:59:51] [I] [TRT] Graph optimization time: 6.0985 seconds.
[06/07/2024-09:59:51] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/07/2024-10:02:07] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[06/07/2024-10:02:08] [I] [TRT] Total Host Persistent Memory: 1424
[06/07/2024-10:02:08] [I] [TRT] Total Device Persistent Memory: 0
[06/07/2024-10:02:08] [I] [TRT] Total Scratch Memory: 3264000
[06/07/2024-10:02:08] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 14 MiB, GPU 460 MiB
[06/07/2024-10:02:08] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 65 steps to complete.
[06/07/2024-10:02:08] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 7.91657ms to assign 13 blocks to 65 nodes requiring 24823808 bytes.
[06/07/2024-10:02:08] [I] [TRT] Total Activation Memory: 24823808
[06/07/2024-10:02:08] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +14, GPU +4, now: CPU 14, GPU 4 (MiB)
[06/07/2024-10:02:08] [I] Engine built in 149.373 sec.
[06/07/2024-10:02:09] [I] [TRT] Loaded engine size: 15 MiB
[06/07/2024-10:02:09] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +14, GPU +0, now: CPU 14, GPU 0 (MiB)
[06/07/2024-10:02:09] [I] Engine deserialized in 0.0149534 sec.
[06/07/2024-10:02:09] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +23, now: CPU 14, GPU 23 (MiB)
[06/07/2024-10:02:09] [I] Setting persistentCacheLimit to 0 bytes.
[06/07/2024-10:02:09] [I] Using random values for input input
[06/07/2024-10:02:09] [I] Input binding for input with dimensions 1x3x640x640 is created.
[06/07/2024-10:02:09] [I] Output binding for boxes with dimensions 1x25200x4 is created.
[06/07/2024-10:02:09] [I] Output binding for scores with dimensions 1x25200x1 is created.
[06/07/2024-10:02:09] [I] Output binding for classes with dimensions 1x25200x1 is created.
[06/07/2024-10:02:09] [I] Starting inference
[06/07/2024-10:02:12] [I] Warmup completed 3 queries over 200 ms
[06/07/2024-10:02:12] [I] Timing trace has 47 queries over 3.2599 s
[06/07/2024-10:02:12] [I]
[06/07/2024-10:02:12] [I] === Trace details ===
[06/07/2024-10:02:12] [I] Trace averages of 10 runs:
[06/07/2024-10:02:12] [I] Average on 10 runs - GPU latency: 67.6262 ms - Host latency: 68.3455 ms (enqueue 3.2754 ms)
[06/07/2024-10:02:12] [I] Average on 10 runs - GPU latency: 67.9526 ms - Host latency: 68.6695 ms (enqueue 3.00969 ms)
[06/07/2024-10:02:12] [I] Average on 10 runs - GPU latency: 68.3488 ms - Host latency: 69.0744 ms (enqueue 3.32358 ms)
[06/07/2024-10:02:12] [I] Average on 10 runs - GPU latency: 68.134 ms - Host latency: 68.8519 ms (enqueue 3.3491 ms)
[06/07/2024-10:02:12] [I]
[06/07/2024-10:02:12] [I] === Performance summary ===
[06/07/2024-10:02:12] [I] Throughput: 14.4176 qps
[06/07/2024-10:02:12] [I] Latency: min = 68.0918 ms, max = 72.8109 ms, mean = 68.6438 ms, median = 68.1351 ms, percentile(90%) = 70.339 ms, percentile(95%) = 71.1956 ms, percentile(99%) = 72.8109 ms
[06/07/2024-10:02:12] [I] Enqueue Time: min = 2.44714 ms, max = 3.88281 ms, mean = 3.2296 ms, median = 3.18118 ms, percentile(90%) = 3.72668 ms, percentile(95%) = 3.74414 ms, percentile(99%) = 3.88281 ms
[06/07/2024-10:02:12] [I] H2D Latency: min = 0.598999 ms, max = 0.658569 ms, mean = 0.61613 ms, median = 0.613525 ms, percentile(90%) = 0.625244 ms, percentile(95%) = 0.632874 ms, percentile(99%) = 0.658569 ms
[06/07/2024-10:02:12] [I] GPU Compute Time: min = 67.3809 ms, max = 72.1001 ms, mean = 67.9247 ms, median = 67.4209 ms, percentile(90%) = 69.6253 ms, percentile(95%) = 70.4827 ms, percentile(99%) = 72.1001 ms
[06/07/2024-10:02:12] [I] D2H Latency: min = 0.097168 ms, max = 0.105347 ms, mean = 0.102994 ms, median = 0.102905 ms, percentile(90%) = 0.104614 ms, percentile(95%) = 0.104858 ms, percentile(99%) = 0.105347 ms
[06/07/2024-10:02:12] [I] Total Host Walltime: 3.2599 s
[06/07/2024-10:02:12] [I] Total GPU Compute Time: 3.19246 s
[06/07/2024-10:02:12] [W] * GPU compute time is unstable, with coefficient of variance = 1.63481%.
[06/07/2024-10:02:12] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[06/07/2024-10:02:12] [I] Explanations of the performance metrics are printed in the verbose logs.
[06/07/2024-10:02:12] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=/mnt/share/yolov5s.engine --fp16 --useDLACore=0 --allowGPUFallback