Tensorrt Error for OCDNet

mainak1 · November 8, 2024, 11:06am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version : 10.6.0.26-1+cuda12.6
GPU Type : rtx 4070ti
Nvidia Driver Version : 550.120
CUDA Version : 12.6
Operating System + Version : Ubuntu 22.04
Python Version (if applicable) : 3.10

I’m using OCDNet deployable onnx file: ocdnet_fan_tiny_2x_icdar_pruned.onnx from OCDNet file browser
However, when converting to engine using the below command:
/usr/src/tensorrt/bin/trtexec --onnx=./ocdnet_fan_tiny_2x_icdar_pruned.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:4x3x736x1280 --fp16 --saveEngine=./ocdnet.fp16.engine
I get the following error:

[11/08/2024-16:33:48] [I] === Model Options ===
[11/08/2024-16:33:48] [I] Format: ONNX
[11/08/2024-16:33:48] [I] Model: ./ocdnet_fan_tiny_2x_icdar_pruned.onnx
[11/08/2024-16:33:48] [I] Output:
[11/08/2024-16:33:48] [I] === Build Options ===
[11/08/2024-16:33:48] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[11/08/2024-16:33:48] [I] avgTiming: 8
[11/08/2024-16:33:48] [I] Precision: FP32+FP16
[11/08/2024-16:33:48] [I] LayerPrecisions: 
[11/08/2024-16:33:48] [I] Layer Device Types: 
[11/08/2024-16:33:48] [I] Calibration: 
[11/08/2024-16:33:48] [I] Refit: Disabled
[11/08/2024-16:33:48] [I] Strip weights: Disabled
[11/08/2024-16:33:48] [I] Version Compatible: Disabled
[11/08/2024-16:33:48] [I] ONNX Plugin InstanceNorm: Disabled
[11/08/2024-16:33:48] [I] TensorRT runtime: full
[11/08/2024-16:33:48] [I] Lean DLL Path: 
[11/08/2024-16:33:48] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[11/08/2024-16:33:48] [I] Exclude Lean Runtime: Disabled
[11/08/2024-16:33:48] [I] Sparsity: Disabled
[11/08/2024-16:33:48] [I] Safe mode: Disabled
[11/08/2024-16:33:48] [I] Build DLA standalone loadable: Disabled
[11/08/2024-16:33:48] [I] Allow GPU fallback for DLA: Disabled
[11/08/2024-16:33:48] [I] DirectIO mode: Disabled
[11/08/2024-16:33:48] [I] Restricted mode: Disabled
[11/08/2024-16:33:48] [I] Skip inference: Disabled
[11/08/2024-16:33:48] [I] Save engine: ./ocdnet.fp16.engine
[11/08/2024-16:33:48] [I] Load engine: 
[11/08/2024-16:33:48] [I] Profiling verbosity: 0
[11/08/2024-16:33:48] [I] Tactic sources: Using default tactic sources
[11/08/2024-16:33:48] [I] timingCacheMode: local
[11/08/2024-16:33:48] [I] timingCacheFile: 
[11/08/2024-16:33:48] [I] Enable Compilation Cache: Enabled
[11/08/2024-16:33:48] [I] Enable Monitor Memory: Disabled
[11/08/2024-16:33:48] [I] errorOnTimingCacheMiss: Disabled
[11/08/2024-16:33:48] [I] Preview Features: Use default preview flags.
[11/08/2024-16:33:48] [I] MaxAuxStreams: -1
[11/08/2024-16:33:48] [I] BuilderOptimizationLevel: -1
[11/08/2024-16:33:48] [I] MaxTactics: -1
[11/08/2024-16:33:48] [I] Calibration Profile Index: 0
[11/08/2024-16:33:48] [I] Weight Streaming: Disabled
[11/08/2024-16:33:48] [I] Runtime Platform: Same As Build
[11/08/2024-16:33:48] [I] Debug Tensors: 
[11/08/2024-16:33:48] [I] Input(s)s format: fp32:CHW
[11/08/2024-16:33:48] [I] Output(s)s format: fp32:CHW
[11/08/2024-16:33:48] [I] Input build shape (profile 0): input=1x3x736x1280+1x3x736x1280+2x3x736x1280
[11/08/2024-16:33:48] [I] Input calibration shapes: model
[11/08/2024-16:33:48] [I] === System Options ===
[11/08/2024-16:33:48] [I] Device: 0
[11/08/2024-16:33:48] [I] DLACore: 
[11/08/2024-16:33:48] [I] Plugins:
[11/08/2024-16:33:48] [I] setPluginsToSerialize:
[11/08/2024-16:33:48] [I] dynamicPlugins:
[11/08/2024-16:33:48] [I] ignoreParsedPluginLibs: 0
[11/08/2024-16:33:48] [I] 
[11/08/2024-16:33:48] [I] === Inference Options ===
[11/08/2024-16:33:48] [I] Batch: Explicit
[11/08/2024-16:33:48] [I] Input inference shape : input=1x3x736x1280
[11/08/2024-16:33:48] [I] Iterations: 10
[11/08/2024-16:33:48] [I] Duration: 3s (+ 200ms warm up)
[11/08/2024-16:33:48] [I] Sleep time: 0ms
[11/08/2024-16:33:48] [I] Idle time: 0ms
[11/08/2024-16:33:48] [I] Inference Streams: 1
[11/08/2024-16:33:48] [I] ExposeDMA: Disabled
[11/08/2024-16:33:48] [I] Data transfers: Enabled
[11/08/2024-16:33:48] [I] Spin-wait: Disabled
[11/08/2024-16:33:48] [I] Multithreading: Disabled
[11/08/2024-16:33:48] [I] CUDA Graph: Disabled
[11/08/2024-16:33:48] [I] Separate profiling: Disabled
[11/08/2024-16:33:48] [I] Time Deserialize: Disabled
[11/08/2024-16:33:48] [I] Time Refit: Disabled
[11/08/2024-16:33:48] [I] NVTX verbosity: 0
[11/08/2024-16:33:48] [I] Persistent Cache Ratio: 0
[11/08/2024-16:33:48] [I] Optimization Profile Index: 0
[11/08/2024-16:33:48] [I] Weight Streaming Budget: 100.000000%
[11/08/2024-16:33:48] [I] Inputs:
[11/08/2024-16:33:48] [I] Debug Tensor Save Destinations:
[11/08/2024-16:33:48] [I] === Reporting Options ===
[11/08/2024-16:33:48] [I] Verbose: Disabled
[11/08/2024-16:33:48] [I] Averages: 10 inferences
[11/08/2024-16:33:48] [I] Percentiles: 90,95,99
[11/08/2024-16:33:48] [I] Dump refittable layers:Disabled
[11/08/2024-16:33:48] [I] Dump output: Disabled
[11/08/2024-16:33:48] [I] Profile: Disabled
[11/08/2024-16:33:48] [I] Export timing to JSON file: 
[11/08/2024-16:33:48] [I] Export output to JSON file: 
[11/08/2024-16:33:48] [I] Export profile to JSON file: 
[11/08/2024-16:33:48] [I] 
[11/08/2024-16:33:48] [I] === Device Information ===
[11/08/2024-16:33:48] [I] Available Devices: 
[11/08/2024-16:33:48] [I]   Device 0: "NVIDIA GeForce RTX 4070 Ti" UUID: GPU-f8485527-9b53-b0a8-239f-c915b1080531
[11/08/2024-16:33:48] [I] Selected Device: NVIDIA GeForce RTX 4070 Ti
[11/08/2024-16:33:48] [I] Selected Device ID: 0
[11/08/2024-16:33:48] [I] Selected Device UUID: GPU-f8485527-9b53-b0a8-239f-c915b1080531
[11/08/2024-16:33:48] [I] Compute Capability: 8.9
[11/08/2024-16:33:48] [I] SMs: 60
[11/08/2024-16:33:48] [I] Device Global Memory: 11996 MiB
[11/08/2024-16:33:48] [I] Shared Memory per SM: 100 KiB
[11/08/2024-16:33:48] [I] Memory Bus Width: 192 bits (ECC disabled)
[11/08/2024-16:33:48] [I] Application Compute Clock Rate: 2.625 GHz
[11/08/2024-16:33:48] [I] Application Memory Clock Rate: 10.501 GHz
[11/08/2024-16:33:48] [I] 
[11/08/2024-16:33:48] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[11/08/2024-16:33:48] [I] 
[11/08/2024-16:33:48] [I] TensorRT version: 10.6.0
[11/08/2024-16:33:48] [I] Loading standard plugins
[11/08/2024-16:33:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 20, GPU 545 (MiB)
[11/08/2024-16:33:49] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2275, GPU +436, now: CPU 2451, GPU 981 (MiB)
[11/08/2024-16:33:49] [I] Start parsing network model.
[11/08/2024-16:33:49] [I] [TRT] ----------------------------------------------------------------
[11/08/2024-16:33:49] [I] [TRT] Input filename:   ./ocdnet_fan_tiny_2x_icdar_pruned.onnx
[11/08/2024-16:33:49] [I] [TRT] ONNX IR version:  0.0.8
[11/08/2024-16:33:49] [I] [TRT] Opset version:    17
[11/08/2024-16:33:49] [I] [TRT] Producer name:    pytorch
[11/08/2024-16:33:49] [I] [TRT] Producer version: 1.14.0
[11/08/2024-16:33:49] [I] [TRT] Domain:           
[11/08/2024-16:33:49] [I] [TRT] Model version:    0
[11/08/2024-16:33:49] [I] [TRT] Doc string:       
[11/08/2024-16:33:49] [I] [TRT] ----------------------------------------------------------------
[11/08/2024-16:33:49] [I] Finished parsing network model. Parse time: 0.0516525
[11/08/2024-16:33:49] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x736x1280 OPT=1x3x736x1280 MAX=2x3x736x1280
[11/08/2024-16:33:50] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/08/2024-16:33:51] [I] [TRT] Compiler backend is used during engine build.
[11/08/2024-16:35:27] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 27825274880 detected for tactic 0x0000000000000000.
[11/08/2024-16:35:45] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 27847884800 detected for tactic 0x0000000000000000.
[11/08/2024-16:35:48] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 27847884800 detected for tactic 0x0000000000000000.
[11/08/2024-16:35:48] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 27847884800 detected for tactic 0x0000000000000000.
[11/08/2024-16:35:49] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 27847884800 detected for tactic 0x0000000000000000.
[11/08/2024-16:35:49] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 27847884800 detected for tactic 0x0000000000000000.
[11/08/2024-16:35:50] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 27847884800 detected for tactic 0x0000000000000000.
[11/08/2024-16:35:51] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 27847884800 detected for tactic 0x0000000000000000.

Any suggestion on this @Morganh

Morganh · November 8, 2024, 2:41pm

Moving to TAO forum since it is related to TAO model.

Morganh · November 8, 2024, 2:45pm

It is due to the insufficient GPU memory. For ocdnet_fan_tiny_2x_icdar_pruned.onnx, it is suggest to use more than 12G GPU memory. You can try another dgpu or use non-vit models to try. For example, deformable_resnet50 onnx model.

Topic		Replies	Views
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1380	July 12, 2022
Tensorrt fails shapeMachine.cpp TensorRT tensorrt , cudnn	2	361	February 16, 2024
TensorRT conversion error for TAO RetinaNet model on Jetson Xavier NX Jetson Xavier NX tensorrt , cudnn	6	380	April 24, 2024
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	662	April 30, 2024
Error loading .trt model Jetson AGX Orin tensorrt	7	60	November 6, 2024
[graphOptimizer.cpp::fusePattern] (!never(dim == ShapeContext::one()) \|\| !never(dim == squeezeSuccessorsOutputDims[i]) failed. ) TensorRT	0	3	November 19, 2024
ONNX model and TensorRT engine works differently TensorRT	5	701	February 20, 2023
I am trying to convert the ONNX SSD mobilnet v3 model into TensorRT Engine. I am getting the below error Jetson TX2 tensorrt , tensorflow	24	3659	February 17, 2022
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4474	April 27, 2022
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	915	September 29, 2022

Tensorrt Error for OCDNet

Description

Environment

Related topics