The inference for adding yolov5 in the deepstream example sends an error when yolov5 turns engine

Please provide complete information as applicable to your setup.

**• Hardware Platform (Jetson / GPU) jetson agx orin
**• DeepStream Version12.2
**• JetPack Version (valid for Jetson only) 6.0
**• TensorRT Version 8.6.1
• NVIDIA GPU Driver Version (valid for GPU only)
**• Issue Type( questions, new requirements, bugs) questions

hi, when I used yolov5 in the deepstream example, the onnx file of yolov5 could not be converted into the engine file. After the successful transfer, there were no changes in the machine environment except the compilation and installation of kafka. The picture is an error message

please set network-mode to 2, then try again.

I changed network-mode: 2 to get this error

it is related to a TRT issue. please refer to this topic.

1: I think this is a problem with deepstream. I have this problem when using deepstream’s nvinfer plugin. The options of yolov5 are configured in the configuration file, and nvinfer plugin automatically converts onnx to engine according to the configuration. Why do you need to turn every time, the corresponding engine already exists in the folder

At the first run, the app will create a new engine. after the first run, the app will load the engine directly if model-engine-file is set. could you share the whole log about this issue?

Calling gst_element_factory_make for nvmultiurisrcbin
!! [WARNING] “encoder” group not enabled.
PERF_MODE Enabled
[WARNING] Unknown param found in gie: gie-unique-id
[WARNING] Unknown param found in gie: nvbuf-memory-type
[ERROR] Passed element is not nv3dsink
Using file: dsserver_config.yml
Civetweb version: v1.16
Server running at port: 9000
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:27.560977421 78902 0xffff84018ca0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2092> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/model_b1_gpu0_fp32.engine
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 4
0 INPUT kFLOAT input 3x640x640
1 OUTPUT kFLOAT boxes 25200x4
2 OUTPUT kFLOAT scores 25200x1
3 OUTPUT kFLOAT classes 25200x1

0:00:27.951351601 78902 0xffff84018ca0 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:2024> [UID = 1]: Backend has maxBatchSize 1 whereas 8 has been requested
0:00:27.951426226 78902 0xffff84018ca0 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2201> [UID = 1]: deserialized backend context :/opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/model_b1_gpu0_fp32.engine failed to match config params, trying rebuild
0:00:27.979184044 78902 0xffff84018ca0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2106> [UID = 1]: Trying to create engine from model files
WARNING: FP32 mode requested with DLA. DLA may execute in FP16 mode instead.
WARNING: [TRT]: onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

ERROR: [TRT]: DLA execution was requested for the network using setDefaultDeviceType, but neither FP16 or Int8 mode is enabled
ERROR: [TRT]: 4: [network.cpp::validate::2901] Error Code 4: Internal Error (DLA validation failed)
Building engine failed

Failed to build CUDA engine
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:34.696852915 78902 0xffff84018ca0 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2126> [UID = 1]: build engine file failed
0:00:35.075258000 78902 0xffff84018ca0 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2212> [UID = 1]: build backend context failed
0:00:35.075345137 78902 0xffff84018ca0 ERROR nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1351> [UID = 1]: generate backend failed, check config file settings
0:00:35.075427411 78902 0xffff84018ca0 WARN nvinfer gstnvinfer.cpp:898:gst_nvinfer_start: error: Failed to create NvDsInferContext instance
0:00:35.075441939 78902 0xffff84018ca0 WARN nvinfer gstnvinfer.cpp:898:gst_nvinfer_start: error: Config file path: /opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/dsserver_pgie_config.yml, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
Running…

ERROR from element primary-nvinference-engine: Failed to create NvDsInferContext instance
Error details: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(898): gst_nvinfer_start (): /GstPipeline:dsserver-pipeline/GstNvInfer:primary-nvinference-engine:
Config file path: /opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/dsserver_pgie_config.yml, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
Returned, stopping playback
Stopping the server…!!
Stopped the server…!!
Deleting pipeline

from the error, the engine 's max BatchsSize is incompatible with the batch-size in cfg? how do you create the engine? could you share the current configuration file of nvinfer?

config_infer_primary_RN34_PN26_960x544_dla0_orin_unprune_agx.txt (1.3 KB)
This configuration file is the yolov5 configuration you provided for ai-nvr,I just modified my current directory based on my actual directory
onnx-file: /opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/yolov5s.onnx
model-engine-file: /opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/model_b1_gpu0_fp32.engine
labelfile-path: /opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/labels.txt

Here is yolov5 in AI NVR: Replacing Deepstream peoplenet model with Yolo model
Please refer above topic for yolov5 in AI NVR.

If max_batch_size, batch_size, and batch_size of the model match, the process generated by the engine will not be triggered again. I have also tested this case. Do the values of the three must be the same in the configuration? If it is not the same, the actual operation will still occur the failure of the engine,

You can have a try to remove the engine file. It will generate the engine file from onnx file if you removed the engine file.

I started this topic because nvinfer used onnx to transfer engine, and there was an error. At present, I can only set batch_size to bypass the problem of transfer engine, but the problem of transfer engine is still not solved

Can you share where you get the yolov5 onnx model? Can the onnx model transfer to engine by trtexec?

This configuration file is the yolov5 configuration you provided for ai-nvr,I just modified my current directory based on my actual directory
onnx-file: /opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/yolov5s.onnx
model-engine-file: /opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/model_b1_gpu0_fp32.engine
labelfile-path: /opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-server-wpy/labels.txt

This is the failure log of my conversion via trtexec:

&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # ./trtexec --onnx=yolov5s.onnx --saveEngine==./yolov5s.engine --fp16 --useDLACore=0 --allowGPUFallback
[06/06/2024-17:40:38] [I] === Model Options ===
[06/06/2024-17:40:38] [I] Format: ONNX
[06/06/2024-17:40:38] [I] Model: yolov5s.onnx
[06/06/2024-17:40:38] [I] Output:
[06/06/2024-17:40:38] [I] === Build Options ===
[06/06/2024-17:40:38] [I] Max batch: explicit batch
[06/06/2024-17:40:38] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[06/06/2024-17:40:38] [I] minTiming: 1
[06/06/2024-17:40:38] [I] avgTiming: 8
[06/06/2024-17:40:38] [I] Precision: FP32+FP16
[06/06/2024-17:40:38] [I] LayerPrecisions:
[06/06/2024-17:40:38] [I] Layer Device Types:
[06/06/2024-17:40:38] [I] Calibration:
[06/06/2024-17:40:38] [I] Refit: Disabled
[06/06/2024-17:40:38] [I] Version Compatible: Disabled
[06/06/2024-17:40:38] [I] ONNX Native InstanceNorm: Disabled
[06/06/2024-17:40:38] [I] TensorRT runtime: full
[06/06/2024-17:40:38] [I] Lean DLL Path:
[06/06/2024-17:40:38] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[06/06/2024-17:40:38] [I] Exclude Lean Runtime: Disabled
[06/06/2024-17:40:38] [I] Sparsity: Disabled
[06/06/2024-17:40:38] [I] Safe mode: Disabled
[06/06/2024-17:40:38] [I] Build DLA standalone loadable: Disabled
[06/06/2024-17:40:38] [I] Allow GPU fallback for DLA: Enabled
[06/06/2024-17:40:38] [I] DirectIO mode: Disabled
[06/06/2024-17:40:38] [I] Restricted mode: Disabled
[06/06/2024-17:40:38] [I] Skip inference: Disabled
[06/06/2024-17:40:38] [I] Save engine: =./yolov5s.engine
[06/06/2024-17:40:38] [I] Load engine:
[06/06/2024-17:40:38] [I] Profiling verbosity: 0
[06/06/2024-17:40:38] [I] Tactic sources: Using default tactic sources
[06/06/2024-17:40:38] [I] timingCacheMode: local
[06/06/2024-17:40:38] [I] timingCacheFile:
[06/06/2024-17:40:38] [I] Heuristic: Disabled
[06/06/2024-17:40:38] [I] Preview Features: Use default preview flags.
[06/06/2024-17:40:38] [I] MaxAuxStreams: -1
[06/06/2024-17:40:38] [I] BuilderOptimizationLevel: -1
[06/06/2024-17:40:38] [I] Input(s)s format: fp32:CHW
[06/06/2024-17:40:38] [I] Output(s)s format: fp32:CHW
[06/06/2024-17:40:38] [I] Input build shapes: model
[06/06/2024-17:40:38] [I] Input calibration shapes: model
[06/06/2024-17:40:38] [I] === System Options ===
[06/06/2024-17:40:38] [I] Device: 0
[06/06/2024-17:40:38] [I] DLACore: 0
[06/06/2024-17:40:38] [I] Plugins:
[06/06/2024-17:40:38] [I] setPluginsToSerialize:
[06/06/2024-17:40:38] [I] dynamicPlugins:
[06/06/2024-17:40:38] [I] ignoreParsedPluginLibs: 0
[06/06/2024-17:40:38] [I]
[06/06/2024-17:40:38] [I] === Inference Options ===
[06/06/2024-17:40:38] [I] Batch: Explicit
[06/06/2024-17:40:38] [I] Input inference shapes: model
[06/06/2024-17:40:38] [I] Iterations: 10
[06/06/2024-17:40:38] [I] Duration: 3s (+ 200ms warm up)
[06/06/2024-17:40:38] [I] Sleep time: 0ms
[06/06/2024-17:40:38] [I] Idle time: 0ms
[06/06/2024-17:40:38] [I] Inference Streams: 1
[06/06/2024-17:40:38] [I] ExposeDMA: Disabled
[06/06/2024-17:40:38] [I] Data transfers: Enabled
[06/06/2024-17:40:38] [I] Spin-wait: Disabled
[06/06/2024-17:40:38] [I] Multithreading: Disabled
[06/06/2024-17:40:38] [I] CUDA Graph: Disabled
[06/06/2024-17:40:38] [I] Separate profiling: Disabled
[06/06/2024-17:40:38] [I] Time Deserialize: Disabled
[06/06/2024-17:40:38] [I] Time Refit: Disabled
[06/06/2024-17:40:38] [I] NVTX verbosity: 0
[06/06/2024-17:40:38] [I] Persistent Cache Ratio: 0
[06/06/2024-17:40:38] [I] Inputs:
[06/06/2024-17:40:38] [I] === Reporting Options ===
[06/06/2024-17:40:38] [I] Verbose: Disabled
[06/06/2024-17:40:38] [I] Averages: 10 inferences
[06/06/2024-17:40:38] [I] Percentiles: 90,95,99
[06/06/2024-17:40:38] [I] Dump refittable layers:Disabled
[06/06/2024-17:40:38] [I] Dump output: Disabled
[06/06/2024-17:40:38] [I] Profile: Disabled
[06/06/2024-17:40:38] [I] Export timing to JSON file:
[06/06/2024-17:40:38] [I] Export output to JSON file:
[06/06/2024-17:40:38] [I] Export profile to JSON file:
[06/06/2024-17:40:38] [I]
[06/06/2024-17:40:38] [I] === Device Information ===
[06/06/2024-17:40:38] [I] Selected Device: Orin
[06/06/2024-17:40:38] [I] Compute Capability: 8.7
[06/06/2024-17:40:38] [I] SMs: 16
[06/06/2024-17:40:38] [I] Device Global Memory: 62841 MiB
[06/06/2024-17:40:38] [I] Shared Memory per SM: 164 KiB
[06/06/2024-17:40:38] [I] Memory Bus Width: 256 bits (ECC disabled)
[06/06/2024-17:40:38] [I] Application Compute Clock Rate: 1.3 GHz
[06/06/2024-17:40:38] [I] Application Memory Clock Rate: 1.3 GHz
[06/06/2024-17:40:38] [I]
[06/06/2024-17:40:38] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[06/06/2024-17:40:38] [I]
[06/06/2024-17:40:38] [I] TensorRT version: 8.6.2
[06/06/2024-17:40:38] [I] Loading standard plugins
[06/06/2024-17:40:38] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 12215 (MiB)
[06/06/2024-17:40:45] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1086, now: CPU 1223, GPU 13340 (MiB)
[06/06/2024-17:40:45] [I] Start parsing network model.
[06/06/2024-17:40:45] [I] [TRT] ----------------------------------------------------------------
[06/06/2024-17:40:45] [I] [TRT] Input filename: yolov5s.onnx
[06/06/2024-17:40:45] [I] [TRT] ONNX IR version: 0.0.8
[06/06/2024-17:40:45] [I] [TRT] Opset version: 17
[06/06/2024-17:40:45] [I] [TRT] Producer name: pytorch
[06/06/2024-17:40:45] [I] [TRT] Producer version: 2.2.1
[06/06/2024-17:40:45] [I] [TRT] Domain:
[06/06/2024-17:40:45] [I] [TRT] Model version: 0
[06/06/2024-17:40:45] [I] [TRT] Doc string:
[06/06/2024-17:40:45] [I] [TRT] ----------------------------------------------------------------
[06/06/2024-17:40:45] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/06/2024-17:40:45] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[06/06/2024-17:40:45] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[06/06/2024-17:40:45] [I] Finished parsing network model. Parse time: 0.0791616
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.11/Concat_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.15/Concat_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Reshape’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Transpose’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split_14: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split_14’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split_15: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split_15’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Constant_13_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 206) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Sub_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Constant_14_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 211) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 213) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 215) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Pow: DLA cores do not support POW ElementWise operation.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Pow’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Expand_3_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Concat_1: DLA only supports concatenation on the C dimension.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Concat_1’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Reshape_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Reshape_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Transpose_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split_1: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split_1_16: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split_1_16’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split_1_17: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split_1_17’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 228) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Sub_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Constant_32_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 233) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 235) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 237) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Pow_1: DLA cores do not support POW ElementWise operation.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Pow_1’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Expand_7_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Concat_3: DLA only supports concatenation on the C dimension.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Concat_3’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Reshape_3’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Reshape_4’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Transpose_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split_2: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split_2’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split_2_18: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split_2_18’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Split_2_19: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Split_2_19’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 250) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Sub_2_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Constant_50_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 255) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 257) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘(Unnamed Layer* 259) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Pow_2: DLA cores do not support POW ElementWise operation.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Pow_2’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Expand_11_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Concat_5: DLA only supports concatenation on the C dimension.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Concat_5’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Reshape_5’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /0/model.24/Concat_6: DLA only supports concatenation on the C dimension.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/0/model.24/Concat_6’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /1/Slice: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/1/Slice’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /1/Slice_1: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/1/Slice_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] /1/Slice_2: DLA only supports slicing 4 dimensional tensors.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/1/Slice_2’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/1/ReduceMax’ (REDUCE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/1/ArgMax’ (TOPK): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:45] [W] [TRT] Layer ‘/1/Cast’ (CAST): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/06/2024-17:40:46] [W] [TRT] /0/model.11/Resize: DLA only supports Resize with pre-set scale factors, hence computing explicit scale factors from output dimensions.
[06/06/2024-17:40:46] [W] [TRT] /0/model.15/Resize: DLA only supports Resize with pre-set scale factors, hence computing explicit scale factors from output dimensions.
[06/06/2024-17:40:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,80,80,2] and [1,1,1,1,1]
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_4. Switching to GPU fallback.
[06/06/2024-17:40:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_10. Switching to GPU fallback.
[06/06/2024-17:40:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_16. Switching to GPU fallback.
[06/06/2024-17:40:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,80,80,2] and [1,1,1,1,1]
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_2. Switching to GPU fallback.
[06/06/2024-17:40:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_8. Switching to GPU fallback.
[06/06/2024-17:40:46] [W] [TRT] Splitting DLA subgraph at: /0/model.24/Mul_8 because DLA validation failed for this layer.
[06/06/2024-17:40:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_8. Switching to GPU fallback.
[06/06/2024-17:40:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_14. Switching to GPU fallback.
[06/06/2024-17:40:46] [W] [TRT] Splitting DLA subgraph at: /0/model.24/Mul_14 because DLA validation failed for this layer.
[06/06/2024-17:40:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_14. Switching to GPU fallback.
[06/06/2024-17:40:46] [W] [TRT] Input tensor has less than 4 dimensions for /1/Mul. At least one shuffle layer will be inserted which cannot run on DLA.
[06/06/2024-17:40:46] [W] [TRT] Dimension: 2 (25200) exceeds maximum allowed size for DLA: 8192
[06/06/2024-17:40:46] [W] [TRT] Validation failed for DLA layer: /1/Mul. Switching to GPU fallback.
[06/06/2024-17:40:50] [I] [TRT] Graph optimization time: 4.95829 seconds.
[06/06/2024-17:40:50] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/06/2024-17:41:58] [W] [TRT] No valid obedient candidate choices for node /0/model.24/Constant_13_output_0_clone_6 + (Unnamed Layer* 250) [Shuffle] + /0/model.24/Mul_14 that meet the preferred precision. The remaining candidate choices will be profiled.
[06/06/2024-17:41:58] [E] Error[10]: Could not find any implementation for node /0/model.24/Constant_13_output_0_clone_6 + (Unnamed Layer* 250) [Shuffle] + /0/model.24/Mul_14.
[06/06/2024-17:41:58] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node /0/model.24/Constant_13_output_0_clone_6 + (Unnamed Layer* 250) [Shuffle] + /0/model.24/Mul_14.)
[06/06/2024-17:41:58] [E] Engine could not be created from network
[06/06/2024-17:41:58] [E] Building engine failed
[06/06/2024-17:41:58] [E] Failed to create engine from model or file.
[06/06/2024-17:41:58] [E] Engine set up failed

I have tested it in my side on JetPack 6.0 GA. I can buid the engine from onnx file. As the onnx model is from forum user and we will release Yolov8 in next release soon, suggest to use formal release Yolov8. We do have Yolov5 DLA sample in here: GitHub - NVIDIA-AI-IOT/cuDLA-samples: YOLOv5 on Orin DLA

$ /usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=/mnt/share/yolov5s.engine --fp16 --useDLACore=0 --allowGPUFallback
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=/mnt/share/yolov5s.engine --fp16 --useDLACore=0 --allowGPUFallback
[06/07/2024-09:59:39] [I] === Model Options ===
[06/07/2024-09:59:39] [I] Format: ONNX
[06/07/2024-09:59:39] [I] Model: yolov5s.onnx
[06/07/2024-09:59:39] [I] Output:
[06/07/2024-09:59:39] [I] === Build Options ===
[06/07/2024-09:59:39] [I] Max batch: explicit batch
[06/07/2024-09:59:39] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[06/07/2024-09:59:39] [I] minTiming: 1
[06/07/2024-09:59:39] [I] avgTiming: 8
[06/07/2024-09:59:39] [I] Precision: FP32+FP16
[06/07/2024-09:59:39] [I] LayerPrecisions:
[06/07/2024-09:59:39] [I] Layer Device Types:
[06/07/2024-09:59:39] [I] Calibration:
[06/07/2024-09:59:39] [I] Refit: Disabled
[06/07/2024-09:59:39] [I] Version Compatible: Disabled
[06/07/2024-09:59:39] [I] ONNX Native InstanceNorm: Disabled
[06/07/2024-09:59:39] [I] TensorRT runtime: full
[06/07/2024-09:59:39] [I] Lean DLL Path:
[06/07/2024-09:59:39] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[06/07/2024-09:59:39] [I] Exclude Lean Runtime: Disabled
[06/07/2024-09:59:39] [I] Sparsity: Disabled
[06/07/2024-09:59:39] [I] Safe mode: Disabled
[06/07/2024-09:59:39] [I] Build DLA standalone loadable: Disabled
[06/07/2024-09:59:39] [I] Allow GPU fallback for DLA: Enabled
[06/07/2024-09:59:39] [I] DirectIO mode: Disabled
[06/07/2024-09:59:39] [I] Restricted mode: Disabled
[06/07/2024-09:59:39] [I] Skip inference: Disabled
[06/07/2024-09:59:39] [I] Save engine: /mnt/share/yolov5s.engine
[06/07/2024-09:59:39] [I] Load engine:
[06/07/2024-09:59:39] [I] Profiling verbosity: 0
[06/07/2024-09:59:39] [I] Tactic sources: Using default tactic sources
[06/07/2024-09:59:39] [I] timingCacheMode: local
[06/07/2024-09:59:39] [I] timingCacheFile:
[06/07/2024-09:59:39] [I] Heuristic: Disabled
[06/07/2024-09:59:39] [I] Preview Features: Use default preview flags.
[06/07/2024-09:59:39] [I] MaxAuxStreams: -1
[06/07/2024-09:59:39] [I] BuilderOptimizationLevel: -1
[06/07/2024-09:59:39] [I] Input(s)s format: fp32:CHW
[06/07/2024-09:59:39] [I] Output(s)s format: fp32:CHW
[06/07/2024-09:59:39] [I] Input build shapes: model
[06/07/2024-09:59:39] [I] Input calibration shapes: model
[06/07/2024-09:59:39] [I] === System Options ===
[06/07/2024-09:59:39] [I] Device: 0
[06/07/2024-09:59:39] [I] DLACore: 0
[06/07/2024-09:59:39] [I] Plugins:
[06/07/2024-09:59:39] [I] setPluginsToSerialize:
[06/07/2024-09:59:39] [I] dynamicPlugins:
[06/07/2024-09:59:39] [I] ignoreParsedPluginLibs: 0
[06/07/2024-09:59:39] [I]
[06/07/2024-09:59:39] [I] === Inference Options ===
[06/07/2024-09:59:39] [I] Batch: Explicit
[06/07/2024-09:59:39] [I] Input inference shapes: model
[06/07/2024-09:59:39] [I] Iterations: 10
[06/07/2024-09:59:39] [I] Duration: 3s (+ 200ms warm up)
[06/07/2024-09:59:39] [I] Sleep time: 0ms
[06/07/2024-09:59:39] [I] Idle time: 0ms
[06/07/2024-09:59:39] [I] Inference Streams: 1
[06/07/2024-09:59:39] [I] ExposeDMA: Disabled
[06/07/2024-09:59:39] [I] Data transfers: Enabled
[06/07/2024-09:59:39] [I] Spin-wait: Disabled
[06/07/2024-09:59:39] [I] Multithreading: Disabled
[06/07/2024-09:59:39] [I] CUDA Graph: Disabled
[06/07/2024-09:59:39] [I] Separate profiling: Disabled
[06/07/2024-09:59:39] [I] Time Deserialize: Disabled
[06/07/2024-09:59:39] [I] Time Refit: Disabled
[06/07/2024-09:59:39] [I] NVTX verbosity: 0
[06/07/2024-09:59:39] [I] Persistent Cache Ratio: 0
[06/07/2024-09:59:39] [I] Inputs:
[06/07/2024-09:59:39] [I] === Reporting Options ===
[06/07/2024-09:59:39] [I] Verbose: Disabled
[06/07/2024-09:59:39] [I] Averages: 10 inferences
[06/07/2024-09:59:39] [I] Percentiles: 90,95,99
[06/07/2024-09:59:39] [I] Dump refittable layers:Disabled
[06/07/2024-09:59:39] [I] Dump output: Disabled
[06/07/2024-09:59:39] [I] Profile: Disabled
[06/07/2024-09:59:39] [I] Export timing to JSON file:
[06/07/2024-09:59:39] [I] Export output to JSON file:
[06/07/2024-09:59:39] [I] Export profile to JSON file:
[06/07/2024-09:59:39] [I]
[06/07/2024-09:59:39] [I] === Device Information ===
[06/07/2024-09:59:39] [I] Selected Device: Orin
[06/07/2024-09:59:39] [I] Compute Capability: 8.7
[06/07/2024-09:59:39] [I] SMs: 8
[06/07/2024-09:59:39] [I] Device Global Memory: 30697 MiB
[06/07/2024-09:59:39] [I] Shared Memory per SM: 164 KiB
[06/07/2024-09:59:39] [I] Memory Bus Width: 256 bits (ECC disabled)
[06/07/2024-09:59:39] [I] Application Compute Clock Rate: 1.3 GHz
[06/07/2024-09:59:39] [I] Application Memory Clock Rate: 0.612 GHz
[06/07/2024-09:59:39] [I]
[06/07/2024-09:59:39] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[06/07/2024-09:59:39] [I]
[06/07/2024-09:59:39] [I] TensorRT version: 8.6.2
[06/07/2024-09:59:39] [I] Loading standard plugins
[06/07/2024-09:59:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 7482 (MiB)
[06/07/2024-09:59:45] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1429, now: CPU 1223, GPU 8949 (MiB)
[06/07/2024-09:59:45] [I] Start parsing network model.
[06/07/2024-09:59:45] [I] [TRT] ----------------------------------------------------------------
[06/07/2024-09:59:45] [I] [TRT] Input filename: yolov5s.onnx
[06/07/2024-09:59:45] [I] [TRT] ONNX IR version: 0.0.8
[06/07/2024-09:59:45] [I] [TRT] Opset version: 17
[06/07/2024-09:59:45] [I] [TRT] Producer name: pytorch
[06/07/2024-09:59:45] [I] [TRT] Producer version: 2.2.1
[06/07/2024-09:59:45] [I] [TRT] Domain:
[06/07/2024-09:59:45] [I] [TRT] Model version: 0
[06/07/2024-09:59:45] [I] [TRT] Doc string:
[06/07/2024-09:59:45] [I] [TRT] ----------------------------------------------------------------
[06/07/2024-09:59:45] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/07/2024-09:59:45] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[06/07/2024-09:59:45] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[06/07/2024-09:59:45] [I] Finished parsing network model. Parse time: 0.0705417
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.11/Concat_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.15/Concat_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Transpose’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_14: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_14’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_15: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_15’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Constant_13_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 206) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Sub_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Constant_14_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 211) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 213) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 215) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Pow: DLA cores do not support POW ElementWise operation.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Pow’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Expand_3_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Concat_1: DLA only supports concatenation on the C dimension.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Concat_1’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Transpose_1’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_1: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_1_16: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_1_16’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_1_17: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_1_17’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 228) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Sub_1_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Constant_32_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 233) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 235) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 237) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Pow_1: DLA cores do not support POW ElementWise operation.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Pow_1’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Expand_7_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Concat_3: DLA only supports concatenation on the C dimension.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Concat_3’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_3’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_4’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Transpose_2’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_2: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_2’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_2_18: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_2_18’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Split_2_19: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Split_2_19’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 250) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Sub_2_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Constant_50_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 255) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 257) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘(Unnamed Layer* 259) [Shuffle]’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Pow_2: DLA cores do not support POW ElementWise operation.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Pow_2’ (ELEMENTWISE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Expand_11_output_0’ (CONSTANT): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Concat_5: DLA only supports concatenation on the C dimension.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Concat_5’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Reshape_5’ (SHUFFLE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /0/model.24/Concat_6: DLA only supports concatenation on the C dimension.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/0/model.24/Concat_6’ (CONCATENATION): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /1/Slice: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/Slice’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /1/Slice_1: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/Slice_1’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] /1/Slice_2: DLA only supports slicing 4 dimensional tensors.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/Slice_2’ (SLICE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/ReduceMax’ (REDUCE): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/ArgMax’ (TOPK): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:45] [W] [TRT] Layer ‘/1/Cast’ (CAST): Unsupported on DLA. Switching this layer’s device type to GPU.
[06/07/2024-09:59:46] [W] [TRT] /0/model.11/Resize: DLA only supports Resize with pre-set scale factors, hence computing explicit scale factors from output dimensions.
[06/07/2024-09:59:46] [W] [TRT] /0/model.15/Resize: DLA only supports Resize with pre-set scale factors, hence computing explicit scale factors from output dimensions.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,80,80,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_4. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_10. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_16. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,80,80,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_2. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_8. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] Splitting DLA subgraph at: /0/model.24/Mul_8 because DLA validation failed for this layer.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,40,40,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_8. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_14. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] Splitting DLA subgraph at: /0/model.24/Mul_14 because DLA validation failed for this layer.
[06/07/2024-09:59:46] [W] [TRT] DLA only allows inputs of the same dimensions to Elementwise, but input shapes were: [1,3,20,20,2] and [1,1,1,1,1]
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /0/model.24/Mul_14. Switching to GPU fallback.
[06/07/2024-09:59:46] [W] [TRT] Input tensor has less than 4 dimensions for /1/Mul. At least one shuffle layer will be inserted which cannot run on DLA.
[06/07/2024-09:59:46] [W] [TRT] Dimension: 2 (25200) exceeds maximum allowed size for DLA: 8192
[06/07/2024-09:59:46] [W] [TRT] Validation failed for DLA layer: /1/Mul. Switching to GPU fallback.

[06/07/2024-09:59:51] [I] [TRT] Graph optimization time: 6.0985 seconds.
[06/07/2024-09:59:51] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/07/2024-10:02:07] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[06/07/2024-10:02:08] [I] [TRT] Total Host Persistent Memory: 1424
[06/07/2024-10:02:08] [I] [TRT] Total Device Persistent Memory: 0
[06/07/2024-10:02:08] [I] [TRT] Total Scratch Memory: 3264000
[06/07/2024-10:02:08] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 14 MiB, GPU 460 MiB
[06/07/2024-10:02:08] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 65 steps to complete.
[06/07/2024-10:02:08] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 7.91657ms to assign 13 blocks to 65 nodes requiring 24823808 bytes.
[06/07/2024-10:02:08] [I] [TRT] Total Activation Memory: 24823808
[06/07/2024-10:02:08] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +14, GPU +4, now: CPU 14, GPU 4 (MiB)
[06/07/2024-10:02:08] [I] Engine built in 149.373 sec.
[06/07/2024-10:02:09] [I] [TRT] Loaded engine size: 15 MiB
[06/07/2024-10:02:09] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +14, GPU +0, now: CPU 14, GPU 0 (MiB)
[06/07/2024-10:02:09] [I] Engine deserialized in 0.0149534 sec.
[06/07/2024-10:02:09] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +23, now: CPU 14, GPU 23 (MiB)
[06/07/2024-10:02:09] [I] Setting persistentCacheLimit to 0 bytes.
[06/07/2024-10:02:09] [I] Using random values for input input
[06/07/2024-10:02:09] [I] Input binding for input with dimensions 1x3x640x640 is created.
[06/07/2024-10:02:09] [I] Output binding for boxes with dimensions 1x25200x4 is created.
[06/07/2024-10:02:09] [I] Output binding for scores with dimensions 1x25200x1 is created.
[06/07/2024-10:02:09] [I] Output binding for classes with dimensions 1x25200x1 is created.
[06/07/2024-10:02:09] [I] Starting inference
[06/07/2024-10:02:12] [I] Warmup completed 3 queries over 200 ms
[06/07/2024-10:02:12] [I] Timing trace has 47 queries over 3.2599 s
[06/07/2024-10:02:12] [I]
[06/07/2024-10:02:12] [I] === Trace details ===
[06/07/2024-10:02:12] [I] Trace averages of 10 runs:
[06/07/2024-10:02:12] [I] Average on 10 runs - GPU latency: 67.6262 ms - Host latency: 68.3455 ms (enqueue 3.2754 ms)
[06/07/2024-10:02:12] [I] Average on 10 runs - GPU latency: 67.9526 ms - Host latency: 68.6695 ms (enqueue 3.00969 ms)
[06/07/2024-10:02:12] [I] Average on 10 runs - GPU latency: 68.3488 ms - Host latency: 69.0744 ms (enqueue 3.32358 ms)
[06/07/2024-10:02:12] [I] Average on 10 runs - GPU latency: 68.134 ms - Host latency: 68.8519 ms (enqueue 3.3491 ms)
[06/07/2024-10:02:12] [I]
[06/07/2024-10:02:12] [I] === Performance summary ===
[06/07/2024-10:02:12] [I] Throughput: 14.4176 qps
[06/07/2024-10:02:12] [I] Latency: min = 68.0918 ms, max = 72.8109 ms, mean = 68.6438 ms, median = 68.1351 ms, percentile(90%) = 70.339 ms, percentile(95%) = 71.1956 ms, percentile(99%) = 72.8109 ms
[06/07/2024-10:02:12] [I] Enqueue Time: min = 2.44714 ms, max = 3.88281 ms, mean = 3.2296 ms, median = 3.18118 ms, percentile(90%) = 3.72668 ms, percentile(95%) = 3.74414 ms, percentile(99%) = 3.88281 ms
[06/07/2024-10:02:12] [I] H2D Latency: min = 0.598999 ms, max = 0.658569 ms, mean = 0.61613 ms, median = 0.613525 ms, percentile(90%) = 0.625244 ms, percentile(95%) = 0.632874 ms, percentile(99%) = 0.658569 ms
[06/07/2024-10:02:12] [I] GPU Compute Time: min = 67.3809 ms, max = 72.1001 ms, mean = 67.9247 ms, median = 67.4209 ms, percentile(90%) = 69.6253 ms, percentile(95%) = 70.4827 ms, percentile(99%) = 72.1001 ms
[06/07/2024-10:02:12] [I] D2H Latency: min = 0.097168 ms, max = 0.105347 ms, mean = 0.102994 ms, median = 0.102905 ms, percentile(90%) = 0.104614 ms, percentile(95%) = 0.104858 ms, percentile(99%) = 0.105347 ms
[06/07/2024-10:02:12] [I] Total Host Walltime: 3.2599 s
[06/07/2024-10:02:12] [I] Total GPU Compute Time: 3.19246 s
[06/07/2024-10:02:12] [W] * GPU compute time is unstable, with coefficient of variance = 1.63481%.
[06/07/2024-10:02:12] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[06/07/2024-10:02:12] [I] Explanations of the performance metrics are printed in the verbose logs.
[06/07/2024-10:02:12] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=/mnt/share/yolov5s.engine --fp16 --useDLACore=0 --allowGPUFallback

Can you share the onnx you transferred

Here is the model. pn26.tar.gz (49.2 MB)
It is same as the attachment in topic: Replacing Deepstream peoplenet model with Yolo model - #7 by kesong

I can convert this file successfully in deepstream6.3 version, but the conversion fails in deepstream6.4 version. Do you have the 6.4 version in hand? Can you help me see it

I use jetson agx orin jetpack 6.0DP version, is it caused by this difference? Do you have any problems with my version