I removed the ‘shapes’ from the commad and I see that it generated an engine. With shapes, are you telling trtexec to generate an engine file that expects an image of size 640*640 for inference?
Here is the log for reference if needed.
/usr/src/tensorrt/bin/trtexec --onnx=m1_1408.onnx --int8 --fp16 --best --useDLACore=1 --allowGPUFallback --saveEngine=./yolov8s_dla_b1_int8.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=m1_1408.onnx --int8 --fp16 --best --useDLACore=1 --allowGPUFallback --saveEngine=./yolov8s_dla_b1_int8.engine
[08/12/2024-13:33:42] [I] === Model Options ===
[08/12/2024-13:33:42] [I] Format: ONNX
[08/12/2024-13:33:42] [I] Model: m1_1408.onnx
[08/12/2024-13:33:42] [I] Output:
[08/12/2024-13:33:42] [I] === Build Options ===
[08/12/2024-13:33:42] [I] Max batch: explicit batch
[08/12/2024-13:33:42] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/12/2024-13:33:42] [I] minTiming: 1
[08/12/2024-13:33:42] [I] avgTiming: 8
[08/12/2024-13:33:42] [I] Precision: FP32+FP16+INT8
[08/12/2024-13:33:42] [I] LayerPrecisions:
[08/12/2024-13:33:42] [I] Calibration: Dynamic
[08/12/2024-13:33:42] [I] Refit: Disabled
[08/12/2024-13:33:42] [I] Sparsity: Disabled
[08/12/2024-13:33:42] [I] Safe mode: Disabled
[08/12/2024-13:33:42] [I] DirectIO mode: Disabled
[08/12/2024-13:33:42] [I] Restricted mode: Disabled
[08/12/2024-13:33:42] [I] Build only: Disabled
[08/12/2024-13:33:42] [I] Save engine: ./yolov8s_dla_b1_int8.engine
[08/12/2024-13:33:42] [I] Load engine:
[08/12/2024-13:33:42] [I] Profiling verbosity: 0
[08/12/2024-13:33:42] [I] Tactic sources: Using default tactic sources
[08/12/2024-13:33:42] [I] timingCacheMode: local
[08/12/2024-13:33:42] [I] timingCacheFile:
[08/12/2024-13:33:42] [I] Heuristic: Disabled
[08/12/2024-13:33:42] [I] Preview Features: Use default preview flags.
[08/12/2024-13:33:42] [I] Input(s)s format: fp32:CHW
[08/12/2024-13:33:42] [I] Output(s)s format: fp32:CHW
[08/12/2024-13:33:42] [I] Input build shapes: model
[08/12/2024-13:33:42] [I] Input calibration shapes: model
[08/12/2024-13:33:42] [I] === System Options ===
[08/12/2024-13:33:42] [I] Device: 0
[08/12/2024-13:33:42] [I] DLACore: 1(With GPU fallback)
[08/12/2024-13:33:42] [I] Plugins:
[08/12/2024-13:33:42] [I] === Inference Options ===
[08/12/2024-13:33:42] [I] Batch: Explicit
[08/12/2024-13:33:42] [I] Input inference shapes: model
[08/12/2024-13:33:42] [I] Iterations: 10
[08/12/2024-13:33:42] [I] Duration: 3s (+ 200ms warm up)
[08/12/2024-13:33:42] [I] Sleep time: 0ms
[08/12/2024-13:33:42] [I] Idle time: 0ms
[08/12/2024-13:33:42] [I] Streams: 1
[08/12/2024-13:33:42] [I] ExposeDMA: Disabled
[08/12/2024-13:33:42] [I] Data transfers: Enabled
[08/12/2024-13:33:42] [I] Spin-wait: Disabled
[08/12/2024-13:33:42] [I] Multithreading: Disabled
[08/12/2024-13:33:42] [I] CUDA Graph: Disabled
[08/12/2024-13:33:42] [I] Separate profiling: Disabled
[08/12/2024-13:33:42] [I] Time Deserialize: Disabled
[08/12/2024-13:33:42] [I] Time Refit: Disabled
[08/12/2024-13:33:42] [I] NVTX verbosity: 0
[08/12/2024-13:33:42] [I] Persistent Cache Ratio: 0
[08/12/2024-13:33:42] [I] Inputs:
[08/12/2024-13:33:42] [I] === Reporting Options ===
[08/12/2024-13:33:42] [I] Verbose: Disabled
[08/12/2024-13:33:42] [I] Averages: 10 inferences
[08/12/2024-13:33:42] [I] Percentiles: 90,95,99
[08/12/2024-13:33:42] [I] Dump refittable layers:Disabled
[08/12/2024-13:33:42] [I] Dump output: Disabled
[08/12/2024-13:33:42] [I] Profile: Disabled
[08/12/2024-13:33:42] [I] Export timing to JSON file:
[08/12/2024-13:33:42] [I] Export output to JSON file:
[08/12/2024-13:33:42] [I] Export profile to JSON file:
[08/12/2024-13:33:42] [I]
[08/12/2024-13:33:42] [I] === Device Information ===
[08/12/2024-13:33:42] [I] Selected Device: Orin
[08/12/2024-13:33:42] [I] Compute Capability: 8.7
[08/12/2024-13:33:42] [I] SMs: 16
[08/12/2024-13:33:42] [I] Compute Clock Rate: 1.3 GHz
[08/12/2024-13:33:42] [I] Device Global Memory: 30592 MiB
[08/12/2024-13:33:42] [I] Shared Memory per SM: 164 KiB
[08/12/2024-13:33:42] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/12/2024-13:33:42] [I] Memory Clock Rate: 1.3 GHz
[08/12/2024-13:33:42] [I]
[08/12/2024-13:33:42] [I] TensorRT version: 8.5.2
[08/12/2024-13:33:43] [I] [TRT] [MemUsageChange] Init CUDA: CPU +220, GPU +0, now: CPU 249, GPU 5475 (MiB)
[08/12/2024-13:33:44] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +292, now: CPU 574, GPU 5788 (MiB)
[08/12/2024-13:33:44] [I] Start parsing network model
[08/12/2024-13:33:44] [I] [TRT] ----------------------------------------------------------------
[08/12/2024-13:33:44] [I] [TRT] Input filename: m1_1408.onnx
[08/12/2024-13:33:44] [I] [TRT] ONNX IR version: 0.0.8
[08/12/2024-13:33:44] [I] [TRT] Opset version: 16
[08/12/2024-13:33:44] [I] [TRT] Producer name: pytorch
[08/12/2024-13:33:44] [I] [TRT] Producer version: 2.2.0
[08/12/2024-13:33:44] [I] [TRT] Domain:
[08/12/2024-13:33:44] [I] [TRT] Model version: 0
[08/12/2024-13:33:44] [I] [TRT] Doc string:
[08/12/2024-13:33:44] [I] [TRT] ----------------------------------------------------------------
[08/12/2024-13:33:44] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/12/2024-13:33:44] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[08/12/2024-13:33:44] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[08/12/2024-13:33:44] [I] Finish parsing network model
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Reshape’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Reshape_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Reshape_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Concat_3: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Concat_3’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_3_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Split: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Split’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Split_15: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Split_15’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Split_16: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Split_16’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Squeeze’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Squeeze_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Squeeze_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_9_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Expand: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Expand’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_10_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Expand_1: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Expand_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Unsqueeze’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Unsqueeze_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Concat_4: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Concat_4’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Reshape_3’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_14_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 240) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 241) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/ConstantOfShape: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/ConstantOfShape’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 243) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 244) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_16_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Expand_2: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Expand_2’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_17_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Expand_3: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Expand_3’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Unsqueeze_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Unsqueeze_3’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Concat_5: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Concat_5’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Reshape_4’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_21_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 255) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 256) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/ConstantOfShape_1: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/ConstantOfShape_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 258) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 259) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_23_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Expand_4: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Expand_4’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_24_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Expand_5: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Expand_5’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Unsqueeze_4’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Unsqueeze_5’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Concat_6: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Concat_6’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Reshape_5’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_28_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 270) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 271) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/ConstantOfShape_2: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/ConstantOfShape_2’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 273) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 274) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Concat_7: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Concat_7’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Concat_8: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Concat_8’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Transpose’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Transpose_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Split_1: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Split_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Split_1_42: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Split_1_42’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/dfl/Reshape’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/dfl/Reshape_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Unsqueeze_6’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Shape’ (SHAPE): DLA only supports FP16 and Int8 precision type. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_30_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Gather’ (GATHER): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_32_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_33_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Div: DLA cores do not support DIV ElementWise operation.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Div’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_34_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 298) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] (Unnamed Layer* 299) [Concatenation]: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 299) [Concatenation]’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 300) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 301) [Gather]’ (GATHER): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 302) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 304) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 308) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 311) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Slice: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Slice’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_35_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 316) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] (Unnamed Layer* 317) [Concatenation]: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 317) [Concatenation]’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 318) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 319) [Gather]’ (GATHER): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 320) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] (Unnamed Layer* 321) [Concatenation]: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 321) [Concatenation]’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 322) [Gather]’ (GATHER): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 323) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 325) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 329) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 332) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 334) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 338) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 342) [Constant]’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Slice_1: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Slice_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Constant_36_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 349) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Div_1: DLA cores do not support DIV ElementWise operation.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Div_1’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Concat_9: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Concat_9’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘(Unnamed Layer* 353) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /0/model.22/Concat_10: DLA only supports concatenation on the C dimension.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/0/model.22/Concat_10’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/1/Transpose’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /1/Slice: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/1/Slice’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] /1/Slice_1: DLA only supports slicing 4 dimensional tensors.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/1/Slice_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/1/ReduceMax’ (REDUCE): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/1/ArgMax’ (TOPK): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Layer ‘/1/Cast’ (CAST): Unsupported on DLA. Switching this layer’s device type to GPU.
[08/12/2024-13:33:44] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32 or Bool.
[08/12/2024-13:33:45] [W] [TRT] Dimension: 3 (14784) exceeds maximum allowed size for DLA: 8192
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/dfl/Softmax. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Dimension: 3 (14784) exceeds maximum allowed size for DLA: 8192
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/dfl/conv/Conv. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Add. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] Batch size (11264) exceeds maximum allowed size for DLA: 4096
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Add. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Add_1. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [2816,1] and [1,1]
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Add_1. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Add_2. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [704,1] and [1,1]
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Add_2. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Sub. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] Dimension: 3 (14784) exceeds maximum allowed size for DLA: 8192
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Sub. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Add_4. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] Dimension: 3 (14784) exceeds maximum allowed size for DLA: 8192
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Add_4. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Add_5. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] Dimension: 3 (14784) exceeds maximum allowed size for DLA: 8192
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Add_5. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Sub_1. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] Dimension: 3 (14784) exceeds maximum allowed size for DLA: 8192
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Sub_1. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Mul_2. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] Dimension: 3 (14784) exceeds maximum allowed size for DLA: 8192
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Mul_2. Switching to GPU fallback.
[08/12/2024-13:33:45] [W] [TRT] Input tensor has less than 4 dimensions for /0/model.22/Sigmoid. At least one shuffle layer will be inserted which cannot run on DLA.
[08/12/2024-13:33:45] [W] [TRT] Dimension: 3 (14784) exceeds maximum allowed size for DLA: 8192
[08/12/2024-13:33:45] [W] [TRT] Validation failed for DLA layer: /0/model.22/Sigmoid. Switching to GPU fallback.
[08/12/2024-13:33:53] [I] [TRT] ---------- Layers Running on DLA ----------
[08/12/2024-13:33:53] [I] [TRT] [DlaLayer] {ForeignNode[/0/model.0/conv/Conv…/0/model.22/Concat_2]}
[08/12/2024-13:33:53] [I] [TRT] ---------- Layers Running on GPU ----------
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Reshape
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Reshape_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Reshape_1
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Reshape_1_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Reshape_2
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Reshape_2_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/dfl/Reshape + /0/model.22/dfl/Transpose
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SOFTMAX: /0/model.22/dfl/Softmax
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONVOLUTION: /0/model.22/dfl/conv/Conv
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_3_output_0
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_3_output_0_clone_1
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_3_output_0_clone_2
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_9_output_0
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_10_output_0
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: (Unnamed Layer* 240) [Constant] + (Unnamed Layer* 241) [Shuffle]
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_16_output_0
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_17_output_0
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: (Unnamed Layer* 255) [Constant] + (Unnamed Layer* 256) [Shuffle]
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_23_output_0
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: /0/model.22/Constant_24_output_0
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CONSTANT: (Unnamed Layer* 270) [Constant] + (Unnamed Layer* 271) [Shuffle]
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/Expand
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/Expand_1
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/Expand_2
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/Expand_3
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/Expand_4
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/Expand_5
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Unsqueeze_1
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Unsqueeze_1_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Unsqueeze
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/ConstantOfShape
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Unsqueeze_3
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Unsqueeze_3_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Unsqueeze_2
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/ConstantOfShape_1
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Unsqueeze_5
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Unsqueeze_5_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Unsqueeze_4
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SLICE: /0/model.22/ConstantOfShape_2
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Squeeze + (Unnamed Layer* 244) [Shuffle]_copy_input
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Squeeze + (Unnamed Layer* 244) [Shuffle]
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Squeeze_1 + (Unnamed Layer* 259) [Shuffle]_copy_input
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Squeeze_1 + (Unnamed Layer* 259) [Shuffle]
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Squeeze_2 + (Unnamed Layer* 274) [Shuffle]_copy_input
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Squeeze_2 + (Unnamed Layer* 274) [Shuffle]
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Unsqueeze_output_0 copy
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Unsqueeze_2_output_0 copy
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Unsqueeze_4_output_0 copy
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Reshape_3
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Reshape_3_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Reshape_4
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Reshape_4_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Reshape_5
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Reshape_5_copy_output
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Transpose + /0/model.22/Unsqueeze_6
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/dfl/Reshape_1
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] POINTWISE: PWN(/0/model.22/Add)
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] POINTWISE: PWN(/0/model.22/Add_1)
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] POINTWISE: PWN(/0/model.22/Add_2)
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] ELEMENTWISE: /0/model.22/Sub
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] POINTWISE: PWN(/0/model.22/Add_4)
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] ELEMENTWISE: /0/model.22/Sub_1
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] POINTWISE: PWN(/0/model.22/Constant_36_output_0 + (Unnamed Layer* 349) [Shuffle], PWN(/0/model.22/Add_5, /0/model.22/Div_1))
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /0/model.22/Div_1_output_0 copy
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /0/model.22/Transpose_1 + (Unnamed Layer* 353) [Shuffle]
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] ELEMENTWISE: /0/model.22/Mul_2
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] POINTWISE: PWN(/0/model.22/Sigmoid)
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] SHUFFLE: /1/Transpose
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /1/Slice
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] COPY: /1/Slice_1
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] REDUCE: /1/ReduceMax
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] TOPK: /1/ArgMax
[08/12/2024-13:33:53] [I] [TRT] [GpuLayer] CAST: /1/Cast
[08/12/2024-13:33:54] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +756, now: CPU 1153, GPU 6667 (MiB)
[08/12/2024-13:33:54] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +123, now: CPU 1236, GPU 6790 (MiB)
[08/12/2024-13:33:54] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/12/2024-13:37:20] [W] [TRT] No implementation of layer /0/model.22/Mul_2 obeys the requested constraints. I.e. no conforming implementation was found for requested layer computation precision and output precision. Using fastest implementation instead.
[08/12/2024-13:37:48] [W] [TRT] No implementation of layer PWN(/0/model.22/Sigmoid) obeys the requested constraints. I.e. no conforming implementation was found for requested layer computation precision and output precision. Using fastest implementation instead.
[08/12/2024-13:37:48] [I] [TRT] Total Activation Memory: 32096583680
[08/12/2024-13:37:48] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[08/12/2024-13:37:50] [I] [TRT] Total Host Persistent Memory: 4944
[08/12/2024-13:37:50] [I] [TRT] Total Device Persistent Memory: 0
[08/12/2024-13:37:50] [I] [TRT] Total Scratch Memory: 473088
[08/12/2024-13:37:50] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 12 MiB, GPU 238 MiB
[08/12/2024-13:37:50] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 64 steps to complete.
[08/12/2024-13:37:50] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 8.22859ms to assign 20 blocks to 64 nodes requiring 7042048 bytes.
[08/12/2024-13:37:50] [I] [TRT] Total Activation Memory: 7042048
[08/12/2024-13:37:50] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +12, GPU +4, now: CPU 12, GPU 4 (MiB)
[08/12/2024-13:37:50] [I] Engine built in 247.673 sec.
[08/12/2024-13:37:50] [I] [TRT] Loaded engine size: 12 MiB
[08/12/2024-13:37:50] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +12, GPU +0, now: CPU 12, GPU 0 (MiB)
[08/12/2024-13:37:50] [I] Engine deserialized in 0.0105551 sec.
[08/12/2024-13:37:50] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +6, now: CPU 12, GPU 6 (MiB)
[08/12/2024-13:37:50] [I] Setting persistentCacheLimit to 0 bytes.
[08/12/2024-13:37:50] [I] Using random values for input input
[08/12/2024-13:37:50] [I] Created input binding for input with dimensions 1x3x512x1408
[08/12/2024-13:37:50] [I] Using random values for output boxes
[08/12/2024-13:37:50] [I] Created output binding for boxes with dimensions 1x14784x4
[08/12/2024-13:37:50] [I] Using random values for output scores
[08/12/2024-13:37:50] [I] Created output binding for scores with dimensions 1x14784x1
[08/12/2024-13:37:50] [I] Using random values for output classes
[08/12/2024-13:37:50] [I] Created output binding for classes with dimensions 1x14784x1
[08/12/2024-13:37:50] [I] Starting inference
[08/12/2024-13:37:54] [I] Warmup completed 10 queries over 200 ms
[08/12/2024-13:37:54] [I] Timing trace has 146 queries over 3.07009 s
[08/12/2024-13:37:54] [I]
[08/12/2024-13:37:54] [I] === Trace details ===
[08/12/2024-13:37:54] [I] Trace averages of 10 runs:
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8657 ms - Host latency: 21.1555 ms (enqueue 0.438264 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.869 ms - Host latency: 21.1609 ms (enqueue 0.423022 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.9894 ms - Host latency: 21.2852 ms (enqueue 0.434546 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8723 ms - Host latency: 21.1626 ms (enqueue 0.413574 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8717 ms - Host latency: 21.1629 ms (enqueue 0.418066 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.9067 ms - Host latency: 21.1983 ms (enqueue 0.433289 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8713 ms - Host latency: 21.1621 ms (enqueue 0.418494 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8685 ms - Host latency: 21.159 ms (enqueue 0.422376 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8983 ms - Host latency: 21.1898 ms (enqueue 0.430933 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8642 ms - Host latency: 21.1535 ms (enqueue 0.412622 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8666 ms - Host latency: 21.1576 ms (enqueue 0.416284 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.889 ms - Host latency: 21.1797 ms (enqueue 0.436084 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8679 ms - Host latency: 21.1585 ms (enqueue 0.4177 ms)
[08/12/2024-13:37:54] [I] Average on 10 runs - GPU latency: 20.8716 ms - Host latency: 21.1619 ms (enqueue 0.417725 ms)
[08/12/2024-13:37:54] [I]
[08/12/2024-13:37:54] [I] === Performance summary ===
[08/12/2024-13:37:54] [I] Throughput: 47.5555 qps
[08/12/2024-13:37:54] [I] Latency: min = 21.1366 ms, max = 22.2006 ms, mean = 21.1756 ms, median = 21.1614 ms, percentile(90%) = 21.1743 ms, percentile(95%) = 21.3097 ms, percentile(99%) = 21.3342 ms
[08/12/2024-13:37:54] [I] Enqueue Time: min = 0.404785 ms, max = 0.547607 ms, mean = 0.425289 ms, median = 0.417664 ms, percentile(90%) = 0.451172 ms, percentile(95%) = 0.476929 ms, percentile(99%) = 0.539062 ms
[08/12/2024-13:37:54] [I] H2D Latency: min = 0.261719 ms, max = 0.319092 ms, mean = 0.265462 ms, median = 0.264687 ms, percentile(90%) = 0.26709 ms, percentile(95%) = 0.268311 ms, percentile(99%) = 0.287109 ms
[08/12/2024-13:37:54] [I] GPU Compute Time: min = 20.8446 ms, max = 21.9099 ms, mean = 20.8844 ms, median = 20.8706 ms, percentile(90%) = 20.8811 ms, percentile(95%) = 21.0193 ms, percentile(99%) = 21.0426 ms
[08/12/2024-13:37:54] [I] D2H Latency: min = 0.0126953 ms, max = 0.0283203 ms, mean = 0.0256893 ms, median = 0.0256348 ms, percentile(90%) = 0.0271301 ms, percentile(95%) = 0.0275269 ms, percentile(99%) = 0.0282593 ms
[08/12/2024-13:37:54] [I] Total Host Walltime: 3.07009 s
[08/12/2024-13:37:54] [I] Total GPU Compute Time: 3.04912 s
[08/12/2024-13:37:54] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/12/2024-13:37:54] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=m1_1408.onnx --int8 --fp16 --best --useDLACore=1 --allowGPUFallback --saveEngine=./yolov8s_dla_b1_int8.engine
Then I ran the command to load the engine and get the dumpProfile.
/usr/src/tensorrt/bin/trtexec --loadEngine=yolov8s_dla_b1_int8.engine --useDLACore=1 --dumpProfile
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov8s_dla_b1_int8.engine --useDLACore=1 --dumpProfile
[08/12/2024-14:04:42] [I] === Model Options ===
[08/12/2024-14:04:42] [I] Format: *
[08/12/2024-14:04:42] [I] Model:
[08/12/2024-14:04:42] [I] Output:
[08/12/2024-14:04:42] [I] === Build Options ===
[08/12/2024-14:04:42] [I] Max batch: 1
[08/12/2024-14:04:42] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/12/2024-14:04:42] [I] minTiming: 1
[08/12/2024-14:04:42] [I] avgTiming: 8
[08/12/2024-14:04:42] [I] Precision: FP32
[08/12/2024-14:04:42] [I] LayerPrecisions:
[08/12/2024-14:04:42] [I] Calibration:
[08/12/2024-14:04:42] [I] Refit: Disabled
[08/12/2024-14:04:42] [I] Sparsity: Disabled
[08/12/2024-14:04:42] [I] Safe mode: Disabled
[08/12/2024-14:04:42] [I] DirectIO mode: Disabled
[08/12/2024-14:04:42] [I] Restricted mode: Disabled
[08/12/2024-14:04:42] [I] Build only: Disabled
[08/12/2024-14:04:42] [I] Save engine:
[08/12/2024-14:04:42] [I] Load engine: yolov8s_dla_b1_int8.engine
[08/12/2024-14:04:42] [I] Profiling verbosity: 0
[08/12/2024-14:04:42] [I] Tactic sources: Using default tactic sources
[08/12/2024-14:04:42] [I] timingCacheMode: local
[08/12/2024-14:04:42] [I] timingCacheFile:
[08/12/2024-14:04:42] [I] Heuristic: Disabled
[08/12/2024-14:04:42] [I] Preview Features: Use default preview flags.
[08/12/2024-14:04:42] [I] Input(s)s format: fp32:CHW
[08/12/2024-14:04:42] [I] Output(s)s format: fp32:CHW
[08/12/2024-14:04:42] [I] Input build shapes: model
[08/12/2024-14:04:42] [I] Input calibration shapes: model
[08/12/2024-14:04:42] [I] === System Options ===
[08/12/2024-14:04:42] [I] Device: 0
[08/12/2024-14:04:42] [I] DLACore: 1
[08/12/2024-14:04:42] [I] Plugins:
[08/12/2024-14:04:42] [I] === Inference Options ===
[08/12/2024-14:04:42] [I] Batch: 1
[08/12/2024-14:04:42] [I] Input inference shapes: model
[08/12/2024-14:04:42] [I] Iterations: 10
[08/12/2024-14:04:42] [I] Duration: 3s (+ 200ms warm up)
[08/12/2024-14:04:42] [I] Sleep time: 0ms
[08/12/2024-14:04:42] [I] Idle time: 0ms
[08/12/2024-14:04:42] [I] Streams: 1
[08/12/2024-14:04:42] [I] ExposeDMA: Disabled
[08/12/2024-14:04:42] [I] Data transfers: Enabled
[08/12/2024-14:04:42] [I] Spin-wait: Disabled
[08/12/2024-14:04:42] [I] Multithreading: Disabled
[08/12/2024-14:04:42] [I] CUDA Graph: Disabled
[08/12/2024-14:04:42] [I] Separate profiling: Disabled
[08/12/2024-14:04:42] [I] Time Deserialize: Disabled
[08/12/2024-14:04:42] [I] Time Refit: Disabled
[08/12/2024-14:04:42] [I] NVTX verbosity: 0
[08/12/2024-14:04:42] [I] Persistent Cache Ratio: 0
[08/12/2024-14:04:42] [I] Inputs:
[08/12/2024-14:04:42] [I] === Reporting Options ===
[08/12/2024-14:04:42] [I] Verbose: Disabled
[08/12/2024-14:04:42] [I] Averages: 10 inferences
[08/12/2024-14:04:42] [I] Percentiles: 90,95,99
[08/12/2024-14:04:42] [I] Dump refittable layers:Disabled
[08/12/2024-14:04:42] [I] Dump output: Disabled
[08/12/2024-14:04:42] [I] Profile: Enabled
[08/12/2024-14:04:42] [I] Export timing to JSON file:
[08/12/2024-14:04:42] [I] Export output to JSON file:
[08/12/2024-14:04:42] [I] Export profile to JSON file:
[08/12/2024-14:04:42] [I]
[08/12/2024-14:04:42] [I] === Device Information ===
[08/12/2024-14:04:42] [I] Selected Device: Orin
[08/12/2024-14:04:42] [I] Compute Capability: 8.7
[08/12/2024-14:04:42] [I] SMs: 16
[08/12/2024-14:04:42] [I] Compute Clock Rate: 1.3 GHz
[08/12/2024-14:04:42] [I] Device Global Memory: 30592 MiB
[08/12/2024-14:04:42] [I] Shared Memory per SM: 164 KiB
[08/12/2024-14:04:42] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/12/2024-14:04:42] [I] Memory Clock Rate: 1.3 GHz
[08/12/2024-14:04:42] [I]
[08/12/2024-14:04:42] [I] TensorRT version: 8.5.2
[08/12/2024-14:04:42] [I] Engine loaded in 0.0073156 sec.
[08/12/2024-14:04:43] [I] [TRT] Loaded engine size: 12 MiB
[08/12/2024-14:04:43] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +12, GPU +0, now: CPU 12, GPU 0 (MiB)
[08/12/2024-14:04:43] [I] Engine deserialized in 0.426043 sec.
[08/12/2024-14:04:43] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +6, now: CPU 12, GPU 6 (MiB)
[08/12/2024-14:04:43] [I] Setting persistentCacheLimit to 0 bytes.
[08/12/2024-14:04:43] [I] Using random values for input input
[08/12/2024-14:04:43] [I] Created input binding for input with dimensions 1x3x512x1408
[08/12/2024-14:04:43] [I] Using random values for output boxes
[08/12/2024-14:04:43] [I] Created output binding for boxes with dimensions 1x14784x4
[08/12/2024-14:04:43] [I] Using random values for output scores
[08/12/2024-14:04:43] [I] Created output binding for scores with dimensions 1x14784x1
[08/12/2024-14:04:43] [I] Using random values for output classes
[08/12/2024-14:04:43] [I] Created output binding for classes with dimensions 1x14784x1
[08/12/2024-14:04:43] [I] Starting inference
[08/12/2024-14:04:46] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[08/12/2024-14:04:46] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[08/12/2024-14:04:46] [I]
[08/12/2024-14:04:46] [I] === Profile (153 iterations ) ===
[08/12/2024-14:04:46] [I] Layer Time (ms) Avg. Time (ms) Median Time (ms) Time %
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to {ForeignNode[/0/model.0/conv/Conv…/0/model.22/Concat_2]} 16.65 0.1089 0.1068 0.5
[08/12/2024-14:04:46] [I] {ForeignNode[/0/model.0/conv/Conv…/0/model.22/Concat_2]} 3094.70 20.2268 20.2066 96.8
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to /0/model.22/Reshape 8.92 0.0583 0.0582 0.3
[08/12/2024-14:04:46] [I] /0/model.22/Reshape_copy_output 1.91 0.0125 0.0124 0.1
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to /0/model.22/Reshape_1 2.99 0.0196 0.0196 0.1
[08/12/2024-14:04:46] [I] /0/model.22/Reshape_1_copy_output 1.01 0.0066 0.0067 0.0
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to /0/model.22/Reshape_2 1.41 0.0092 0.0092 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Reshape_2_copy_output 0.81 0.0053 0.0053 0.0
[08/12/2024-14:04:46] [I] /0/model.22/dfl/Reshape + /0/model.22/dfl/Transpose 7.90 0.0516 0.0516 0.2
[08/12/2024-14:04:46] [I] /0/model.22/dfl/Softmax 4.09 0.0267 0.0267 0.1
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to /0/model.22/dfl/conv/Conv 8.62 0.0564 0.0564 0.3
[08/12/2024-14:04:46] [I] /0/model.22/dfl/conv/Conv 4.67 0.0305 0.0306 0.1
[08/12/2024-14:04:46] [I] /0/model.22/Expand 0.98 0.0064 0.0064 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Expand_1 0.89 0.0058 0.0059 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Expand_2 0.84 0.0055 0.0055 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Expand_3 0.90 0.0059 0.0060 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Expand_4 0.84 0.0055 0.0055 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Expand_5 0.82 0.0053 0.0054 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Unsqueeze_1_copy_output 0.94 0.0062 0.0062 0.0
[08/12/2024-14:04:46] [I] /0/model.22/ConstantOfShape 0.87 0.0057 0.0057 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Unsqueeze_3_copy_output 0.81 0.0053 0.0053 0.0
[08/12/2024-14:04:46] [I] /0/model.22/ConstantOfShape_1 0.80 0.0053 0.0052 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Unsqueeze_5_copy_output 0.79 0.0051 0.0051 0.0
[08/12/2024-14:04:46] [I] /0/model.22/ConstantOfShape_2 0.73 0.0047 0.0048 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Squeeze + (Unnamed Layer* 244) [Shuffle]_copy_input 0.84 0.0055 0.0055 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Squeeze_1 + (Unnamed Layer* 259) [Shuffle]_copy_input 0.71 0.0047 0.0046 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Squeeze_2 + (Unnamed Layer* 274) [Shuffle]_copy_input 0.70 0.0046 0.0046 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Unsqueeze_output_0 copy 0.93 0.0061 0.0060 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Unsqueeze_2_output_0 copy 0.88 0.0058 0.0057 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Unsqueeze_4_output_0 copy 0.77 0.0050 0.0050 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Reshape_3_copy_output 0.82 0.0053 0.0053 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Reshape_4_copy_output 0.71 0.0047 0.0046 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Reshape_5_copy_output 0.71 0.0047 0.0046 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Transpose + /0/model.22/Unsqueeze_6 0.97 0.0064 0.0064 0.0
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Output Tensor 0 to /0/model.22/Transpose + /0/model.22/Unsqueeze_6 1.02 0.0067 0.0067 0.0
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to /0/model.22/dfl/Reshape_1 1.31 0.0086 0.0086 0.0
[08/12/2024-14:04:46] [I] /0/model.22/dfl/Reshape_1 1.10 0.0072 0.0072 0.0
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to PWN(/0/model.22/Add) 0.94 0.0061 0.0061 0.0
[08/12/2024-14:04:46] [I] PWN(/0/model.22/Add) 0.92 0.0060 0.0060 0.0
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to PWN(/0/model.22/Add_1) 0.78 0.0051 0.0051 0.0
[08/12/2024-14:04:46] [I] PWN(/0/model.22/Add_1) 0.85 0.0055 0.0055 0.0
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 0 to PWN(/0/model.22/Add_2) 0.77 0.0050 0.0050 0.0
[08/12/2024-14:04:46] [I] PWN(/0/model.22/Add_2) 0.83 0.0054 0.0055 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Sub 1.28 0.0084 0.0083 0.0
[08/12/2024-14:04:46] [I] PWN(/0/model.22/Add_4) 1.20 0.0078 0.0077 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Sub_1 1.11 0.0072 0.0073 0.0
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Output Tensor 0 to /0/model.22/Sub_1 1.08 0.0070 0.0070 0.0
[08/12/2024-14:04:46] [I] PWN(/0/model.22/Constant_36_output_0 + (Unnamed Layer* 349) [Shuffle], PWN(/0/model.22/Add_5, /0/model.22/Div_1)) 1.76 0.0115 0.0115 0.1
[08/12/2024-14:04:46] [I] /0/model.22/Div_1_output_0 copy 1.07 0.0070 0.0070 0.0
[08/12/2024-14:04:46] [I] Reformatting CopyNode for Input Tensor 1 to /0/model.22/Mul_2 0.83 0.0055 0.0054 0.0
[08/12/2024-14:04:46] [I] /0/model.22/Mul_2 1.08 0.0071 0.0070 0.0
[08/12/2024-14:04:46] [I] PWN(/0/model.22/Sigmoid) 0.95 0.0062 0.0061 0.0
[08/12/2024-14:04:46] [I] /1/Transpose 1.23 0.0080 0.0080 0.0
[08/12/2024-14:04:46] [I] /1/Slice 0.94 0.0061 0.0061 0.0
[08/12/2024-14:04:46] [I] /1/Slice_1 0.81 0.0053 0.0053 0.0
[08/12/2024-14:04:46] [I] /1/ReduceMax 1.06 0.0070 0.0069 0.0
[08/12/2024-14:04:46] [I] /1/ArgMax 1.11 0.0072 0.0072 0.0
[08/12/2024-14:04:46] [I] /1/Cast 0.75 0.0049 0.0049 0.0
[08/12/2024-14:04:46] [I] Total 3196.70 20.8935 20.8717 100.0
[08/12/2024-14:04:46] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov8s_dla_b1_int8.engine --useDLACore=1 --dumpProfile