How to deploy attention on deepstream6.0?

huihui308 · December 5, 2023, 3:30am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson)
• DeepStream Version:6.0
• JetPack Version (valid for Jetson only):4.6
• TensorRT Version:TensorRT: 8.0.1.6
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

When I use trtexec(TensorRT: 8.0.1.6) to generate engine file on jetson, there is a error.

$ /usr/local/TensorRT-8.6.1.6/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8 (base) david@david-ubuntu20:BGF-YOLO$ /usr/local/TensorRT-8.6.1./bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8  2023-12-04_08:50:01#(base) david@david-ubuntu20:BGF-YOLO$ /usr/local/TensorRT-8.5.3.1/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8 (base) david@david-ubuntu20:BGF-YOLO$ /usr/local/TensorRT-8.5.3.1/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8 2023-12-04_08:50:03#&&&& RUNNING TensorRT.trtexec [TensorRT v8503] # /usr/local/TensorRT-8.5.3.1/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8
2023-12-04_08:50:03#[12/04/2023-08:50:03] [W] --explicitBatch flag has been deprecated and has no effect!
2023-12-04_08:50:03#[12/04/2023-08:50:03] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
2023-12-04_08:50:03#[12/04/2023-08:50:03] [W] --workspace flag has been deprecated by --memPoolSize flag.
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === Model Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Format: ONNX
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Model: best.onnx
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Output:
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === Build Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Max batch: explicit batch
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Memory Pools: workspace: 1024 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] minTiming: 1
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] avgTiming: 8
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Precision: FP32+FP16
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] LayerPrecisions: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Calibration: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Refit: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Sparsity: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Safe mode: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] DirectIO mode: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Restricted mode: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Build only: Enabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Save engine: best_4.engine
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Load engine: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Profiling verbosity: 0
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Tactic sources: Using default tactic sources
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] timingCacheMode: local
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] timingCacheFile: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Heuristic: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Preview Features: Use default preview flags.
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Input(s)s format: fp32:CHW
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Output(s)s format: fp32:CHW
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Input build shapes: model
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Input calibration shapes: model
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === System Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Device: 0
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] DLACore: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Plugins:
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === Inference Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Batch: Explicit
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Input inference shapes: model
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Iterations: 10
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Duration: 3s (+ 200ms warm up)
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Sleep time: 0ms
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Idle time: 0ms
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Streams: 1
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] ExposeDMA: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Data transfers: Enabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Spin-wait: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Multithreading: Enabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] CUDA Graph: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Separate profiling: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Time Deserialize: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Time Refit: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] NVTX verbosity: 0
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Persistent Cache Ratio: 0
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Inputs:
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] === Reporting Options ===
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Verbose: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Averages: 10 inferences
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Percentiles: 90,95,99
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Dump refittable layers:Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Dump output: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Profile: Disabled
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Export timing to JSON file: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Export output to JSON file: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] Export profile to JSON file: 
2023-12-04_08:50:03#[12/04/2023-08:50:03] [I] 
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] === Device Information ===
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Selected Device: NVIDIA GeForce RTX 3080 Ti
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Compute Capability: 8.6
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] SMs: 80
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Compute Clock Rate: 1.665 GHz
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Device Global Memory: 12042 MiB
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Shared Memory per SM: 100 KiB
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Memory Bus Width: 384 bits (ECC disabled)
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] Memory Clock Rate: 9.501 GHz
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] 
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] TensorRT version: 8.5.3
2023-12-04_08:50:04#[12/04/2023-08:50:04] [I] [TRT] [MemUsageChange] Init CUDA: CPU +446, GPU +0, now: CPU 459, GPU 486 (MiB)
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] Start parsing network model
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] ----------------------------------------------------------------
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Input filename:   best.onnx
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] ONNX IR version:  0.0.8
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Opset version:    17
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Producer name:    pytorch
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Producer version: 2.1.0
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Domain:           
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Model version:    0
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Doc string:       
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] ----------------------------------------------------------------
2023-12-04_08:50:05#[12/04/2023-08:50:05] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] No importer registered for op: Mod. Attempting to import as plugin.
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] [TRT] Searching for plugin: Mod, plugin_version: 1, plugin_namespace: 
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:769: While parsing node number 160 [Mod -> "/model.12/Mod_output_0"]:
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:770: --- Begin node ---
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:771: input: "/model.12/Constant_output_0"
2023-12-04_08:50:05#input: "/model.12/Constant_1_output_0"
2023-12-04_08:50:05#output: "/model.12/Mod_output_0"
2023-12-04_08:50:05#name: "/model.12/Mod"
2023-12-04_08:50:05#op_type: "Mod"
2023-12-04_08:50:05#attribute {
2023-12-04_08:50:05#  name: "fmod"
2023-12-04_08:50:05#  i: 0
2023-12-04_08:50:05#  type: INT
2023-12-04_08:50:05#}
2023-12-04_08:50:05#
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:772: --- End node ---
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] [TRT] ModelImporter.cpp:775: ERROR: builtin_op_importers.cpp:4870 In function importFallbackPluginImporter:
2023-12-04_08:50:05#[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] Failed to parse onnx file
2023-12-04_08:50:05#[12/04/2023-08:50:05] [I] Finish parsing network model
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] Parsing model failed
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] Failed to create engine from model or file.
2023-12-04_08:50:05#[12/04/2023-08:50:05] [E] Engine set up failed
2023-12-04_08:50:05#&&&& FAILED TensorRT.trtexec [TensorRT v8503] # /usr/local/TensorRT-8.5.3.1/bin/trtexec --onnx=best.onnx --saveEngine=best_4.engine --explicitBatch --fp16 --workspace=1024 --buildOnly --threads=8

Could you give me a hint how to solve this issue without changing tensorrt version?
Because I find there is no error when I use TensorRT-8.5.3.1.

haowang · December 5, 2023, 5:56am

That is because TensorRT-8.0 did not support Attention layers. I did not aware any solution expect switch to TensorRT8.5

huihui308 · December 5, 2023, 5:57am

Thank you for replying.

system · December 19, 2023, 5:57am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Regarding doubts about deepstream custom parser for onnx with deepstream batch DeepStream SDK gstreamer , deepstream	5	30	September 14, 2024
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1382	July 12, 2022
Convert onnx to engine model TensorRT deepstream	4	47	November 19, 2024
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	669	April 30, 2024
Profile inference time of each layer for .engine model to know where is bottleneck in Deepstream? DeepStream SDK	17	710	June 19, 2023
TensorRT quantization bug on Jetpack 6.0 Jetson AGX Orin tensorrt , pytorch	6	573	January 22, 2024
TensorRT 8.6 not running properly on Orin NX with Jetpack 6 Jetson AGX Orin tensorrt , generative_ai	6	952	December 25, 2023
Can not bulild TensorRT engine file on DINO-FAN_base model for jetson inference DeepStream SDK tensorrt , jetson-inference	3	309	November 14, 2023
About trtexec Jetson Nano tensorrt	2	3356	October 15, 2021
getPluginCreator could not find plugin BatchedNMS_TRT version 1 TensorRT	5	3985	December 23, 2020

How to deploy attention on deepstream6.0?

Related topics