How to deploy attention on deepstream6.0?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson)
• DeepStream Version:6.0
• JetPack Version (valid for Jetson only):4.6
• TensorRT Version:TensorRT: 8.0.1.6
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

When I use trtexec(TensorRT: 8.0.1.6) to generate engine file on jetson, there is a error.

$ /usr/local/TensorRT-8.6.1.6/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8 (base) david@david-ubuntu20:BGF-YOLO$ /usr/local/TensorRT-8.6.1./bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8  2023-12-04_08:50:01#(base) david@david-ubuntu20:BGF-YOLO$ /usr/local/TensorRT-8.5.3.1/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8 (base) david@david-ubuntu20:BGF-YOLO$ /usr/local/TensorRT-8.5.3.1/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8 2023-12-04_08:50:03#&&&& RUNNING TensorRT.trtexec [TensorRT v8503] # /usr/local/TensorRT-8.5.3.1/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8
2023-12-04_08:50:03#[12/04/2023-08:50:03] [W] --explicitBatch flag has been deprecated and has no effect!
2023-12-04_08:50:03#[12/04/2023-08:50:03] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
2023-12-04_08:50:03#[12/04/2023-08:50:03] [W] --workspace flag has been deprecated by --memPoolSize flag.
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === Model Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Format: ONNX
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Model: best.onnx
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Output:
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === Build Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Max batch: explicit batch
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Memory Pools: workspace: 1024 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] minTiming: 1
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] avgTiming: 8
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Precision: FP32+FP16
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] LayerPrecisions: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Calibration: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Refit: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Sparsity: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Safe mode: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] DirectIO mode: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Restricted mode: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Build only: Enabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Save engine: best_4.engine
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Load engine: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Profiling verbosity: 0
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Tactic sources: Using default tactic sources
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] timingCacheMode: local
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] timingCacheFile: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Heuristic: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Preview Features: Use default preview flags.
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Input(s)s format: fp32:CHW
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Output(s)s format: fp32:CHW
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Input build shapes: model
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Input calibration shapes: model
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === System Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Device: 0
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] DLACore: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Plugins:
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === Inference Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Batch: Explicit
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Input inference shapes: model
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Iterations: 10
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Duration: 3s (+ 200ms warm up)
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Sleep time: 0ms
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Idle time: 0ms
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Streams: 1
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] ExposeDMA: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Data transfers: Enabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Spin-wait: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Multithreading: Enabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] CUDA Graph: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Separate profiling: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Time Deserialize: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Time Refit: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] NVTX verbosity: 0
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Persistent Cache Ratio: 0
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Inputs:
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === Reporting Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Verbose: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Averages: 10 inferences
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Percentiles: 90,95,99
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Dump refittable layers:Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Dump output: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Profile: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Export timing to JSON file: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Export output to JSON file: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Export profile to JSON file: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] 
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] === Device Information ===
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Selected Device: NVIDIA GeForce RTX 3080 Ti
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Compute Capability: 8.6
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] SMs: 80
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Compute Clock Rate: 1.665 GHz
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Device Global Memory: 12042 MiB
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Shared Memory per SM: 100 KiB
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Memory Bus Width: 384 bits (ECC disabled)
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Memory Clock Rate: 9.501 GHz
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] 
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] TensorRT version: 8.5.3
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] [TRT] [MemUsageChange] Init CUDA: CPU +446, GPU +0, now: CPU 459, GPU 486 (MiB)
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] Start parsing network model
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] ----------------------------------------------------------------
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Input filename:   best.onnx
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] ONNX IR version:  0.0.8
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Opset version:    17
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Producer name:    pytorch
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Producer version: 2.1.0
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Domain:           
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Model version:    0
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Doc string:       
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] ----------------------------------------------------------------
2023-12-04_08:50:05#[12/04/2023-08:50:05] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] No importer registered for op: Mod. Attempting to import as plugin.
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Searching for plugin: Mod, plugin_version: 1, plugin_namespace: 
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:769: While parsing node number 160 [Mod -> "/model.12/Mod_output_0"]:
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:770: --- Begin node ---
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:771: input: "/model.12/Constant_output_0"
2023-12-04_08:50:05#input: "/model.12/Constant_1_output_0"
2023-12-04_08:50:05#output: "/model.12/Mod_output_0"
2023-12-04_08:50:05#name: "/model.12/Mod"
2023-12-04_08:50:05#op_type: "Mod"
2023-12-04_08:50:05#attribute {
2023-12-04_08:50:05#  name: "fmod"
2023-12-04_08:50:05#  i: 0
2023-12-04_08:50:05#  type: INT
2023-12-04_08:50:05#}
2023-12-04_08:50:05#
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:772: --- End node ---
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:775: ERROR: builtin_op_importers.cpp:4870 In function importFallbackPluginImporter:
2023-12-04_08:50:05#[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] Failed to parse onnx file
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] Finish parsing network model
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] Parsing model failed
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] Failed to create engine from model or file.
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] Engine set up failed
2023-12-04_08:50:05#&&&& FAILED TensorRT.trtexec [TensorRT v8503] # /usr/local/TensorRT-8.5.3.1/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8

Could you give me a hint how to solve this issue without changing tensorrt version?
Because I find there is no error when I use TensorRT-8.5.3.1.

That is because TensorRT-8.0 did not support Attention layers. I did not aware any solution expect switch to TensorRT8.5

Thank you for replying.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.